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The way forward is through Paris 


Leaders must come together on a solid agreement at the United Nations climate conference — and 
then get to work at home by meeting commitments and finding new ways to reduce emissions. 


a brief appearance at the last major global climate summit. The 

2009 Copenhagen negotiations descended into an angry free-for- 
all, although one basic idea was agreed: that countries, rich and poor, 
need to step forward with their own climate solutions. This idea stuck 
and is now at the heart of the negotiations going into the United Nations 
Paris Climate Conference, where countries will attempt to forge the 
first ever fully fledged international climate agreement. Nature offers 
a package of stories and commentaries this week (see nature.com/ 
parisclimate) previewing what many expect to be the biggest step so far 
towards controlling global greenhouse-gas emissions. 

That optimism should not be taken as a sign that all is well. Last year 
was the warmest on record. This year will be warmer still, with aver- 
age temperatures expected to reach more than 1 °C above pre-industrial 
levels. An array of impacts are already being documented around the 
globe, including melting ice, decreasing crop yields and shifting animal- 
migration patterns. And yet, despite a quarter of a century of increasingly 
desperate debate, greenhouse-gas emissions continue to rise. 

We know that any deal emerging from the Paris conference will 
not solve the problem. Even if nations follow through on the climate 
pledges that have been made so far, global emissions are projected to 
rise until at least 2030, and temperatures could reach 2°C above pre- 
industrial levels as early as 2032. The UN has set the goal of limiting 
any rise to 2 °C, but even this increase would not protect the world’s 
most vulnerable citizens from rising tides, extreme weather and shift- 
ing precipitation patterns. 


T: world’s leaders left a fabulous mess in their wake after making 


PLETHORA OF PLEDGES 
But there are reasons for optimism. Foremost is the fact that a solid 
majority of nations, accounting for roughly 91% of global emissions, 
have submitted climate pledges. Many, including those ofall developed 
countries, feature commitments to curb greenhouse-gas emissions. 
Others, from a plethora of developing countries, focus on sustainable 
development and adaptation to the impacts of rising temperatures. Even 
with financial and technological aid, emissions will continue to rise in 
these countries as governments seek to lift their people out of poverty. 
All told, the world’s pledges fall short. But for the first time, govern- 
ments are moving forward collectively; as David Victor and James 
Leape point out in their Comment on page 439, that is the first step. 
Although many countries want to make these commitments binding 
under international law, they will remain voluntary, at least for now. The 
US Senate’s aversion to international treaties is often blamed, but many 
countries worry about binding commitments given the difficulty of the 


real consequences for those that did not live up to their obligations. 
The focus now is on building a ‘pledge-and-review system that 
pushes countries to submit their own national commitments, which 
are then up for review by other governments and groups. There is 
some evidence that this ‘institutionalized peer pressure’ can work: 
175 countries have voluntarily submitted pledges so far. 
Economic and political momentum is building. Renewable energy 
is growing faster than anybody projected just a few years ago. The 
consultancy Bloomberg New Energy Finance 


“World leaders has projected that renewables will account 
must come for two-thirds of the US$12 trillion that will 
together and be invested in electricity generation over the 
signal the way next 25 years. Brazil has made huge progress 
forwar a in reducing deforestation, and the palm-oil 


industry has committed to reduce deforesta- 
tion in Indonesia and other countries. The countries of the Organi- 
sation for Economic Co-operation and Development agreed on 18 
November to restrict financing for coal-fired power plants, and the 
United Kingdom is weighing up a proposal to shut down all of its coal 
plants by 2025. In the United States, coal is on the ropes thanks to a 
combination of regulation and cheap natural gas. 

In Paris, negotiators must provide a strong framework for reporting 
and verifying climate pledges. Governments, scientists and environ- 
mentalists need solid information about who is doing what. And the 
agreement should require a five-year review process so that govern- 
ments can identify ways to go even further at the next major climate 
summit in 2020. Once everybody is pointed in the right direction, the 
hope is that human ingenuity will kick in, and the world will discover 
ways to reduce emissions more quickly. 

As reported in our News Feature on page 436, however, limiting the 
temperature rise to 2°C will be difficult. Barring premature retirement 
of much of the existing fossil-fuel infrastructure, the only way to get 
there will be to overshoot the target and then bring atmospheric carbon 
dioxide concentrations back down later in the century. Unless engineers 
figure out a simple way to pull CO, out of the atmosphere, this probably 
means deploying bioenergy at massive scales, capturing the CO, that is 
emitted during energy production and pumping it underground. 

One day, governments may decide that measures such as extreme 
decarbonization are necessary. In the meantime, scientists must inves- 
tigate the social, political and economic realities ahead and research 
the consequences of rising emissions, including potentially cata- 
strophic shifts in the climate system. 

In Paris next week, world leaders must come together and signal the 


economic transition that is required. The 1997 
Kyoto Protocol included binding commitments j 

from most developed nations — notably exclud- | 
ing the United States — but many developed 
countries received a free pass. And there were no 


I™ 


PARIS CLIMATE TALKS 


A Nature special issue 
nature.com/parisclimate 


way forward for their governments, their citi- 
zens and for businesses and investors. If humans 
want to keep living on a planet that looks, feels 
and functions like the one we live on now, it is 
time to sign an agreement and get to work. m 
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Built on trust 


Written agreements between parties in research 
collaborations are not a sign of alack of faith. 


have been avoided if the parties involved had hammered out 
the details of their collaboration beforehand. 

On 16 November, researchers at Peking University in Beijing 
claimed discovery of a biological-compass mechanism that could 
explain how some animals sense magnetism (S. Qin et al. Nature Mater. 
http://doi.org/89v; 2015). But some of the paper’s thunder was stolen 
by a researcher at Tsinghua University, also in Beijing, who reported 
in September how the same mechanism could be used to manipulate 
neurons in worms (X. Long et al. Sci. Bull. http://doi.org/883; 2015). 

When the September paper was published, the lead Peking 
University researcher cried foul, claiming that his Tsinghua colleague 
had agreed not to publish until the Nature Materials paper came out 
(see Nature http://doi.org/9gg; 2015). University administrators got 
involved, the Tsinghua researcher was fired, and his graduate stu- 
dent, whose career has been upended, circulated a plea for support to 
China’s scientific community. The Peking researcher has called for his 
rival's paper to be retracted. Both parties have mustered e-mails and 
other correspondence to show that the facts are on their side. 

A detailed, formalized agreement could have prevented this. When 
embarking on a collaboration, it can be hard to ask a scientific peer 
to sign a contract. Lawyers get involved, making it cumbersome and 
costly. Fencing off rights to patents, authorship, publication and 
decision-making authority can be tedious and can cause tension. A 


A scuffle that has riled the Chinese scientific community could 


simple handshake is much more comfortable. 

This is true for researchers around the world. But in China, where 
people are finely tuned to what might make them lose face, the bar 
is especially high. Asking someone to sign such an agreement feels 
equivalent to saying that you don’t trust them. 

A survey of Chinese researchers undertaken by Nature Publishing 
Group supports that observation. Scientists who had worked abroad 

were asked about the differences in the work- 


“The bigges t ing environment in China compared with 
hindrance to that in other countries, including the ease 
harmonious of carrying out collaborations. Some noted 
collaboration that Chinese researchers usually do not ask 
was tension over _ for formal agreements. The reason might be 


cultural, but it could also be that most univer- 
sities and research organizations in China do 
not have the personnel to support this function. 

The survey results appear in a 26 November report, Turning Point: 
Chinese Science in Transition (see go.nature.com/ybsatt and go.nature. 
com/fdwacj; in Chinese). The biggest hindrance to harmonious 
collaboration, according to interviewees, was tension over author- 
ship — a factor that plays a substantial part in the dispute over the 
biological-compass papers. In China, assessors of a researcher’s 
achievements focus on papers in which the individual is first or corre- 
sponding author. The report suggests that research assessment should 
take a more balanced approach, and that policymakers can iron out 
some of these wrinkles. 

It is clear that university administrators can help collaborations 
by providing personnel to deal with the legal aspects. It might be 
a burden in the short term, but in the long term it would encour- 
age collaboration. Scientists with valuable knowledge who want to 
protect their rights to priority in publication, patents and other areas 
deserve as much. = 


authorship.” 


Drugs on demand 


Controversy in Brazil over access to a purported 
cancer cure could set a harmful precedent. 


university against hundreds of cancer patients who want 
access to a compound that some have branded a miracle cure. 

But whether the compound holds any benefits at all remains to be 
seen: it has never been evaluated in human trials. The conflict is an 
extreme version of a debate that has gone on in the United States and 
elsewhere, as terminally ill people whose diseases have withstood 
modern medicine’s proved arsenal have demanded access to untested 
treatments. 

As we report on page 420, courts in Brazil have previously 
sympathized with those demands, ordering the University of Sao Paulo 
to provide a compound called phosphoethanolamine to hundreds of 
patients. People on both sides of this debate are armed with good inten- 
tions. The university argues that the drug is untested, and should not 
be used to give false hope — and unknown side effects — to vulnerable 
patients. On the other side, it is understandable that people with little 
hope may prefer the uncertainty of an untested drug to the certainty of 
a terminal illness. 

But there are also concerning reports that some people with 
cancer are not taking their prescribed medications, for fear that 
scientifically proven medicine may interfere with the supposed 
miracle of phosphoethanolamine. The tenor of the debate has also 
been harmful at times, with some phosphoethanolamine advocates 
accusing the government or the pharmaceutical industry of actively 


A furious debate that is raging in Brazil pits the nation’s largest 
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suppressing further development of the drug. 

The sad truth is that the drug is unlikely to be a miracle. In the 
United States, for example, only one in ten drugs that make it to 
phase I clinical trials are destined to gain approval from the US Food 
and Drug Administration (FDA). And phosphoethanolamine has not 
even made it that far: its promise is backed up by a few publications 
based on lab and animal tests. 

Even so, terminally ill patients may be willing to try a treatment 
with only the slimmest odds of success. In the United States, several 
states have passed laws that, to varying degrees, grant such patients 
the right to try experimental drugs outside the purview of the FDA. 
The laws have triggered debates of their own, and have come under 
fire for offering false hope and for potentially leading patients away 
from other, more promising avenues. 

The situation in Brazil is more extreme. A university laboratory is 
neither a pharmaceutical plant nor a pharmacy; it is not required to 
follow good manufacturing protocols. There is no oversight to certify 
what is going into the blue-and-white phosphoethanolamine capsules 
produced at the University of Sao Paulo. Neither the compound’s side 
effects nor its efficacy are systematically monitored. To order a uni- 
versity to supply a drug is to show a disregard for the importance of 
all these safety measures. 

The hope of phosphoethanolamine lies in further research. Federal 
funders in Brazil have said that they will support further preclini- 
cal studies of the drug. Researchers are pursuing options for moving 
the compound into clinical trials, should those animal studies suc- 
ceed; patients who are interested in pursuing phosphoethanolamine 
treatment could enrol in the clinical tests. In the 
meantime, the courts should liberate patients 
from the legal tug-of-war and uphold the latest 
decision to halt distribution of phosphoethan- 
olamine until its potential is better understood. m 
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WORLD VIEW  jennisiconon 


first attempt to agree on decisive action to avoid what the United 
Nations defines as dangerous climate change. 

The climate negotiations have set this danger threshold at 
1.5-2 °C of global warming above pre-industrial levels. With such 
a guard rail established, the required components of a ‘successful’ 
climate deal more or less fall into place. A reasonable chance of 
attaining 2 °C translates to a finite global carbon budget of about 
900 gigatonnes of carbon dioxide from 2015 onward that must be 
shared in a fair way between all nations. 

Can and should the Paris talks deliver an agreement that gives a 
binding commitment from all nations to meet this outcome? The last 
time the world gathered for a decisive global agreement on climate 
change, in Copenhagen in 2009, the remit was 
that, yes, world leaders needed to do nothing less 
than decide on a global, legally binding agree- 
ment that met the scientific targets of a safe and 
just future below 2°C. 

But since Copenhagen, the global discourse 
has changed. In 2009, it was possible to show 
convincingly only that we needed to tackle 
the climate challenge; it was not easy to show 
that it was possible. Today, the need is more 
apparent than ever. And, more importantly, 
there is ample evidence that scaling up eco- 
nomically competitive, clean-energy solutions 
is possible. 

Before Copenhagen, economists generally 
thought that a high oil price was the best way 
to enable a transition to a decarbonized future. 
The surprising reality is that low oil prices 
seem to be the most effective way of ensuring a transition away from 
fossil fuels. Renewable energy systems compete even at low oil prices, 
which in turn closes the door on unconventional, expensive oil, such 
as offshore oil and exploitation in difficult environments such as the 
Arctic. It also opens a unique window to introducing a global price on 
carbon — clearly the most effective policy measure for accelerating 
the transition to fossil-fuel-free energy. 

Experience across industrial sectors shows that new solutions can 
scale up and become part of the mainstream in markets and societies 
only once they have penetrated at least 15-20% of the marketplace or 
society. For renewable energy, this penetration has been achieved in 
enough countries only in the past three to four years. 

In this new situation, is it possible to envisage a transformation to 
a decarbonized world by around 2050 even if 


S o here we go again. Nations are meeting in Paris for their twenty- 


WE NEED AN 


AGREEMENT 


THAT IS 


TO TIP THE WORLD 


DECISIVELY 


TOWARDS RAPID 
DECARBONIZATION. 


A ‘perfect’ agreement in 
Paris is not essential 


Success at the latest climate talks will be a recognition by the world’s nations 
that incremental change will not do the job, says Johan Rockstrém. 


incremental change, but rather ‘the assurance that the world is serious 
about a transformation. We need an agreement that is good enough to 
tip the world decisively towards rapid decarbonization. A new treaty 
does not need to force nations into compliance, but rather should 
create confidence and send the right signal — to investors, businesses 
and societies at large — that the global political leadership is turning 
irrevocably towards a new sustainable era. 

How ambitious must the Paris agreement be to decisively support 
such a trajectory? To meet the 2°C limit, the world must cut carbon 
emissions at about 6% per year. National pledges on the table at Paris 
will not get us close. From experience, we know that emissions cuts in 
the range of 0-2% per year are within the realm of incremental policy 
measures. A range of 2-3% requires ambitious adaptation. Once levels 
exceeding 3-4% are reached, experience indi- 
cates that radical measures are needed, such as 
carbon taxes and the phasing out of coal power. 

These are the kinds of changes needed to 
decarbonize the world economy, and above all, 
to send clear signals of a shift from incremental 
to transformative change. Success in Paris should 
thus be viewed as an agreement that corresponds 
to a pace of emissions cuts of greater than 3-4% 
per year, starting in the 2015-20 window. 

In turn, this would suggest that Paris must 
accumulate 80% of the national pledges needed 
to stay within the 2°C guard rail, with at least 
20% of the countries committing to more 
than 4% average cuts per year, to create a large 
enough critical mass of nations committed to 
decarbonization and to influence the global logic 
(see go.nature.com/luxlyn). Achieving this goal 
is ambitious but realistic. And it comes with a decent chance that, 
once nations realize the benefits of decarbonization, they will 
increase their pledges. It is crucial, therefore, that the Paris agreement 
allows for recurrent recalibration of the pledges, at least every third 
or fifth year. 

It would be dangerous to allow ‘success’ to be reduced to a low 
level of political achievement so that the world continues along an 
incremental policy path that stands no chance of supporting a tran- 
sition to decarbonization. Equally, scientists can no longer dismiss 
as failure an agreement that is not fully in line with the demands 
of climate science. For if Paris is widely perceived to have failed, 
political leadership is likely once again to enter a post-Copenhagen 
climate trauma and instead focus on other more urgent (and politically 
rewarding) issues. m 


Paris does not deliver the ‘perfect’ agreement? j 
The answer is yes. To get there, the threshold \ 
for success in Paris should not be at the level 
of ‘resolving the climate problem’ through 
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RESEARCH HIGHLIGHTS 


METABOLISM 


Gastric surgery 
alters sweet tooth 


Some weight-loss surgeries 
can diminish cravings 
for sweets by altering the 
brain’s response to the 
neurotransmitter dopamine. 
Ivan de Araujo of Yale 
University in New Haven, 
Connecticut, and his 
colleagues studied the effects 
of a duodenal-jejunal bypass, 
which reroutes food from 
the stomach directly into 
the middle part of the small 
intestine. They found that 
well-fed mice that did not have 
the surgery consumed more 
sugar after previous repeated 
exposure to sweets. Mice 
that had the surgery did not 
develop the same sweet tooth. 
Sugar consumption led 
to the release of dopamine, 
which is involved in reward 
responses, particularly when 
the sugar was administered 
to the upper region of the 
intestines (the area bypassed 
in the surgery). Activating 
dopamine-sensing neurons 
restored the sweet cravings in 
mice that had undergone the 
surgery. 
Cell Metab. http://doi.org/9dm 
(2015) 


Toads saved from 
killer fungus 


Biologists have rid a wild 
toad species of a lethal 
fungal disease that threatens 


Selections from the 
scientific literature 


ZOOLOGY 


Mollusc sees with its shell 


A marine mollusc has hundreds of eyes in its 


armour that can see images. 


Christine Ortiz at the Massachusetts 
Institute of Technology in Cambridge and 
her colleagues studied the structural, optical 
and mechanical properties of the eyes of 
Acanthopleura granulata (pictured) using 
various experimental and computational 
techniques. Unlike in most animals, the 
microscopic lenses are not organic, but 
are made of the mineral aragonite. These 


amphibians around the world. 
The chytrid fungus 
Batrachochytrium dendrobatidis 
has wiped out many species of 
frogs and toads. Jaime Bosch 
at Spain’s National Museum of 
Natural History in Madrid and 
his team removed tadpoles 
of the midwife toad (Alytes 
muletensis; pictured) from 
ponds on the Spanish island 
of Mallorca and treated them 
in the lab with a drug that kills 
the fungus. They also drained 
the ponds and sprayed 
them with a disinfectant 
before returning the 
tadpoles. The fungus 
disappeared in four 
out of five treated 
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minimize light scattering because they are 


made of large and aligned crystals. Projecting 


2 metres away. 


the team found. 


ponds for two years. 

The method may work only 
in some habitats, the authors 
say. 

Biol. Lett. 11, 20150874 (2015) 


Snow-fed water 
supply threatened 


The southwestern United 
States, the Iberian Peninsula 
and parts of the Middle East 
and other regions are at risk 
of seasonal water shortages 
resulting from decreasing 
snowfall in a warming 
climate. 

Justin Mankin at Columbia 


images through the lenses showed that they 
could resolve an image of a potential predator 
of around 20 centimetres in size from about 


The shells are much weaker at these points 
than elsewhere, but the organism has evolved 
ways to compensate for the structural weakness, 


Science 350, 952-956 (2015) 


University in New York 

and his colleagues looked 

at projections from various 
climate models to determine 
how warming might affect 
snowfall and river run-off in 
more than 400 large basins 
in the Northern Hemisphere. 
The team identified a dozen 
or so snow-sensitive basins 
that, across all climate 
models, face an 80-100% risk 
of declining water supply in 
the coming decades. Each 

of the sensitive basins has a 
current population of more 
than 1 million people — 
including the Rio Grande 
basin spanning Texas and 
Mexico, the Ebro—Duero 


ALAN CRESSLER 


CHRIS MATTISON/NATUREPL.COM 


NASA 


basin in Spain and the Asi 
basin in Lebanon and Syria. 
Environ. Res, Lett. 10,114016 
(2015) 


NUTRITION 


Personalized 
diets for health 


People who eat identical meals 
display different blood glucose 
levels afterwards, thanks in 
part to differences in their gut 
microbes. 

Large spikes in blood glucose 
after eating increase the risk 
of type 2 diabetes, so dietary 
guidelines rank foods based 
on their glycaemic index — an 
indicator of their effects on 
blood glucose. Eran Elinav and 
Eran Segal of the Weizmann 
Institute of Science in Rehovot, 
Israel, and their colleagues 
continuously monitored 
the diets and lifestyles of 
800 people over a week, and 
found that meals with the 
same glycaemic index caused 
widely different glucose levels 
in participants. By analysing 
data on the participants’ gut 
microbiomes, physical activity 
and other clinical factors, the 
team created personalized diets 
for 26 people and found that 
these resulted in lower glucose 
levels after meals than did 
non-personalized diets. 

The study could partly 
explain the limited efficacy of 
universal dietary guidelines, 
the authors say. 

Cell 163, 1079-1094 (2015) 


Lasers reveal 
quantum jitters 


Ultrafast laser pulses can be 
used to detect the motion of 
a single atom, from energetic 
wiggles to quantum jitters. 
Kale Johnson at the 
University of Maryland 
in College Park and his 
colleagues trapped ions 
of ytterbium and zapped 
them with laser pulses 
just 10 picoseconds long. 
The pulses gave the atom 
small kicks in momentum 
of different magnitudes, 
depending on its internal 


state. This resulted ina 
new state that encoded the 
atom’s original motion. After 
another sequence of pulses, 
the researchers observed 
fluorescent light from the 
atom that allowed them to 
measure its quantum motion. 
The technique could be 
useful for future quantum 
computers built from trapped 
ions, the team says. 
Phys. Rev. Lett. 115, 213001 
(2015) 


Flower given 
digital power 


Researchers have incorporated 
electronic circuitry into the 
tissues of a rose. 

Magnus Berggren at 
Linképing University in 
Norrk6ping, Sweden, and 
his colleagues submerged 
the cut end ofa rose stem 
into a water-based solution 
of PEDOT, a conducting 
polymer that is used in 
printable electronics. 
Capillary action pulled the 
polymer up into the rose’s 
vascular tissue, where it came 
out of solution and self- 
assembled into wires, some 
as long as 10 centimetres. By 
attaching gold probes coated 
with PEDOT to the wires, the 
researchers made individual 
transistors and demonstrated 
a simple digital circuit. 

The transistors’ electrical 
performance was on a par 
with that of conventional 
printed PEDOT circuits. 

The technology could 
eventually be used to record or 
regulate plant physiology, the 
authors say. 

Sci. Adv. 1, e1501136 (2015) 


AGRICULTURAL ECOLOGY 


Complex effects of 
pesticides on bees 


Honeybee colonies could be 
compensating for the harmful 
effects of certain pesticides by 
producing more workers, at 
least in the short term. 

Some European countries 
banned neonicotinoid 
pesticides in 2013, but this 
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SOCIAL SELECTION 


Popular topics 
on social media 


Text-mining block prompts response 


A scientist who mines the text of research publications 
was blocked by the scientific publisher Elsevier from 
downloading large numbers of its papers — a move that 
he described in a blog post that was shared by many on 
social media. Chris Hartgerink, a statistician at Tilburg 
University in the Netherlands, says that the publisher is 
hindering his research. Elsevier allows text-mining through 
the use ofa specific application programming interface, 
and says that this prevents its website from being slowed 
down by researchers who download large amounts of data. 
Frank Huysmans, a library scientist at the University of 
Amsterdam, linked to the blog post on 


> NATURE.COM 
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remains controversial because 
field studies have failed to 
confirm the adverse effects 
reported for bees in the 

lab. Mickaél Henry at the 
French National Institute 

of Agricultural Research in 
Avignon and his colleagues 
positioned honeybee 

colonies in farmers’ fields 

so that they were exposed to 
varying levels of the pesticide 
thiamethoxam. The team 
radio-tagged and monitored 
nearly 7,000 bees, and found 
that pesticide exposure caused 
an acceleration in death rate 
over time. 

The colonies, however, 
compensated for dead 
foragers by producing 
more workers and fewer 
drones. This maintains 
honey production but could 
decrease bee reproduction 
in the long term. The risks of 
pesticides in the field may be 
best understood by studying 
entire colony cycles, the 
authors say. 

Proc. R. Soc. B 282, 20152110 
(2015) 


Martian moon will 
break apart 


Phobos, one of Mars’s two 
moons, will disintegrate some 
20 million to 40 million years 


Twitter: “How signing away copyright to 
academic publishers obstructs content 
mining research ... Strong case for 
#openaccess #tdm.” 


from now, and its particles will 
form the only planetary ring in 
the inner Solar System. 
Benjamin Black and Tushar 
Mittal of the University of 
California, Berkeley, made 
these predictions by analysing 
tidal and other forces that 
are currently pulling Phobos 
(pictured) towards Mars. 
Using a geological model 
of how rock holds together, 
they calculated that the moon 
would rip apart before it 
smashed into the planet. The 
resulting ring would be stable 
for 1 million to 100 million 
years, they say. 
Nature Geosci. http://dx.doi. 
org/10.1038/nge02583 (2015) 
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SEVEN DAYS 


EVENTS 


Whalers fined 


An Australian court fined 

a Japanese company 

Aus$1 million (US$724,000) 
on 18 November and found 
the firm to be in contempt of 
court for killing minke whales 
in an area declared a sanctuary 
by Australia. According 

to the animal-protection 
organization Humane Society 
International (HSI), which, 
along with the Environmental 
Defender’s Office, brought the 
case against the firm Kyodo 
Senpaku Kaisha, this is one of 
the largest fines issued under 
Australian conservation law. 
The company caught whales 
in four different years, despite 
a 2008 injunction against the 
practice, says the HSI. 


Climate repeals 

The US Senate voted on 

17 November to repeal a 

pair of regulations by the 
Environmental Protection 
Agency that would limit 
carbon emissions from new 
and existing power plants. 
Votes on both rules were led by 
Republicans and passed by a 
margin of 52-46; the House of 
Representatives is considering 
similar resolutions. Coming 
just two weeks before the 
United Nations climate summit 
in Paris, the resolutions are 
largely symbolic. US President 
Barack Obama promised 

to veto both repeals, and 
supporters do not have the 
two-thirds majority needed to 
override a veto. 


Statoil Arctic exit 


Norwegian energy company 
Statoil announced on 

17 November that it would 
cease exploration for gas and 
oil in Alaska’s Chukchi Sea. 
The decision comes just over a 
month after Royal Dutch Shell 
suspended its own exploration 
off the Alaskan coast, citing 
regulatory uncertainty anda 


The news in brief 


Tasmanian devils returned to the wild 


Tasmania has 39 more wild devils, after the latest 
batch of healthy individuals was released from 
the Devils Ark Santuary (pictured is manager 
Dean Reid) onto the Forestier Peninsula on 

18 November. The area was cleared of Tasmanian 
devils (Sarcophilus harrisii) after an infectious 


disappointing survey of the 
area’ fossil-fuel prospects. 

The Statoil decision sees the 
company exit early from 

16 leases that were set to expire 
in 2020. 


L'Aquila verdict 
Italy’s highest court of 

appeal on 20 November 
upheld a decision to acquit 

6 seismologists accused of 
manslaughter in regard to the 
2009 LAquila earthquake, 
which killed more than 

300 people. Prosecutors 
claimed that the scientists 
misled townspeople about the 
risk, leading them to stay in 
their homes instead of seeking 
safety. The scientists were 
originally given six-year prison 
sentences, but an appeals court 
in LAquila acquitted them 

last November, and reduced 
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to two years the sentence of 
Bernardo De Bernardinis, 
former deputy director of 

the Italian Civil Protection 
Department, who was also 
convicted. De Bernardinis’s 
reduced sentence was upheld; 
he still faces a separate charge 
of manslaughter. 


Ebola setback 


Ina setback to efforts to end the 
Ebola epidemic in West Africa, 
the World Health Organization 
announced three new cases 

of the disease in Liberia on 

20 November. One of those 
individuals, a 15-year-old boy, 
died on 23 November. The 
country had been declared 
Ebola-free on 3 September. 
Sierra Leone was declared 
Ebola-free on 7 November, 

and the last case in Guinea 

was reported on 29 October, 


cancer that is devastating populations of the 
endangered animals was detected there in 2004. 
A ‘devil-proof fence’ has now been installed to 
prevent the new, healthy population from mixing 
with animals afflicted with the deadly and 
infectious devil facial tumour disease. 


leading to hopes that the 
epidemic, which began in 
December 2013, might finally 
be nearing an end. 


Pandemic report 

A panel of physicians, 
scientists and policy 

experts has called for major 
reforms to the World Health 
Organization and other 
international health-response 
systems following the Ebola 
epidemic that has killed more 
than 11,000 people. The 
panel, convened by Harvard 
University in Cambridge, 
Massachusetts, and the 
London School of Hygiene and 
Tropical Medicine, released 
its report on 22 November 

(S. Moon et. al. Lancet http:// 
doi.org/9gf; 2015). It also 
recommends measures to 
improve prevention, detection 


JASON REED/REUTERS 


# and response to outbreaks, and 
& to speed research on diseases 

= that cause them. See go.nature. 
com/jxxvs6 for more. 


PHER GRUNE! 


Rare rhino dies 

§ Northern white rhinoceroses 
© (Ceratotherium simum 
cottoni) are one step closer to 
extinction, after a 41-year-old 
female named Nola had to be 
put down after surgery at the 
San Diego Zoo Safari Park in 
California on 22 November. 
The last three remaining 
individuals — two females 
that cannot reproduce 
naturally and a male with a 
low sperm count — live at 

Ol Pejeta Conservancy in 
Kenya. Conservationists hope 
that the species can be saved 
through assisted reproduction 
techniques, using southern 
white rhinos (Ceratotherium 
simum simum) as surrogates. 


POLICY 


CRISPR cress 

The Swedish Board of 
Agriculture on 17 November 
told two Swedish universities 
that they do not need special 
approval for field trials of 
some cress (Arabidopsis, 
pictured) varieties mutated 
by the CRISPR-Cas9 gene- 
editing technique. In June, 
the European Commission 
had asked European Union 
member states to hold back 
on such rulings until it makes 
its own proposals on how to 


drugs for HIV has increased 


TREND WATCH 


The availability of antiretroviral 


SOURCE: J. BOR ETAL. PLOS MED. 12, E1001905 (2015) 


women's lifespans more than 
men’s in KwaZulu Natal in South 
Africa, concludes a study of more 
than 98,000 people (J. Bor et al. 
PLoS Med. 12, €1001905; 2015). 


Since free antiretroviral treatment 


became available in South 
Africa in 2004, declines in life 


expectancy have reversed for both 


genders. But progress is uneven, 
with women gaining more years 
of life than men. The authors 
recommend that HIV outreach 
activities be targeted to men. 


regulate organisms modified 
by new genetic techniques. 
But the Swedish authority 
said decisions needed to be 
made now, so that trials can be 
prepared for the next growing 
season. 


Chimps retired 

The US National Institutes 
of Health (NIH) is ceasing 

its chimpanzee-research 
programme altogether, two 
years after retiring most of 
its chimps. In a 16 November 
e-mail to the agency's 
administrators, NIH director 
Francis Collins announced 
that the 50 NIH-owned 
animals that remain available 
for research will be sent to 
sanctuaries. The agency will 
also develop a plan to phase 
out NIH support for the 
remaining chimps that are 
supported, but not owned, 
by the NIH. See page 422 for 
more. 


Coal curbs 

The Organisation for 
Economic Co-operation 
and Development agreed 


on 18 November to restrict 
public financing for coal-fired 
power plants. Two years in 
the making, the agreement 
removes support for large, 
low-efficiency coal-fired 
plants while maintaining 
support for medium-sized, 
high-efficiency plants in 
countries facing energy 
shortages, and for small, less- 
efficient plants in the poorest 
countries. The restrictions 
will not apply to any coal- 
fired plants that are equipped 
to capture and store carbon 
emissions. 


} FUNDING 
UK research review 


A tensely awaited report 
into the future of the major 
UK research-funding 
agencies, released on 

19 November, suggests 

the creation of a powerful 
umbrella organization called 
Research UK to manage the 
agencies. The review was 

led by Nobel-prizewinning 
geneticist Paul Nurse. Many 
scientists feared that it would 


HIV LIFE EXPECTANCIES DIFFER BY GENDER 


Freely available antiretroviral therapy has decreased the mortality 
rate of HIV-positive women more than are HIV-positive men in one 


region of rural South Africa. 
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SEVEN DAYS | THIS WEEK | 


30 NOVEMBER 

The leaders of the 
world’s nations gather 
to broker a climate deal 
at the United Nations 
Paris Climate Change 
Conference. 
nature.com/parisclimate 


1-3 DECEMBER 
Washington DC hosts 
the International Summit 
on Human Gene Editing. 
go.nature.com/huzip3 


2 DECEMBER 

The European Space 
Agency's LISA 
Pathfinder satellite, 
which will hunt for 
gravitational waves, 
launches from Kourou, 
French Guiana. 
go.nature.com/rxrzuc 


recommend a total merger of 
the research councils, which 
collectively distribute some 
£3 billion (US$4.6 billion) of 
government research funding 
each year. Nurse recommends 
that Research UK be led by an 
experienced researcher, who 
would in effect be boss of the 
heads of the seven discipline- 
based councils. See go.nature. 
com/2rwzeu for more. 


| __BUSINESS 
Mega-merger 


Two major pharmaceutical 
firms are to merge ina 
US$160-billion deal, they 
announced on 23 November. 
Pfizer of New York will 
combine with Allergan, based 
in Dublin, in a merger that is 
expected to be completed by 
the end of 2016. The resulting 
firm will be named Pfizer 

but will be headquartered 

in Dublin — providing a 
significant tax break for the 
US firm — and will have more 
than 100 medicines in mid-to- 
late-stage development. 


> NATURE.COM 
For daily news updates see: 
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AquAdvantage Atlantic salmon (at back) grow to twice the size of an normal Atlantic salmon (Salmo salar) over the same time. 


BIOTECHNOLOGY 


Transgenic salmon leaps 
to the dinner table 


Long-awaited decision by US government authorizes the first genetically engineered 


animal to be sold as food. 


BY HEIDI LEDFORD 


breed of fast-growing Atlantic 
A= rocketed to celebrity status on 
19 November when it became the first 
genetically engineered animal to be approved 
for human consumption in the United States. 
The landmark decision by the US Food 
and Drug Administration (FDA) releases the 
‘AquAdvantage’ salmon from two decades of 
regulatory limbo — but it could also revitalize 


an industry that has waited a long time for any 
sign that its products might make it to market. 
“Tt opens up the possibility of harnessing this 
technology,’ says Alison Van Eenennaam, an 
animal geneticist at the University of Califor- 
nia, Davis. “The regulatory roadblock had really 
been disincentivizing the world from using it” 
The FDA decision comes at a time when the 
US government is re-evaluating how it regu- 
lates genetically engineered crops and animals. 
On 2 July, the White House Office of Science 


and Technology Policy said that it will update 
those regulations — for the first time since 
1992 — over the next year. And at a meeting 
on 18 November, the US Department of Agri- 
culture (USDA) discussed preliminary plans to 
revise its guidelines for genetically engineered 
crops. 

A key driving force for these discussions 
is the recognition that current regulations 
may not cover crops and animals engineered 
using cutting-edge techniques, such as the 
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> CRISPR-Cas9 system, that allow researchers 
to make targeted changes to the genome. The 
USDA has already determined that its regula- 
tions do not apply to several genome-edited 
crops. Van Eenennaam says that it is still 
unclear how the FDA will regulate animals that 
have been engineered using that technology. 

“There is a lot going on these days,” says 
Greg Jaffe, director of biotechnology at the 
Center for Science in the Public Interest in 
Washington DC. “But obviously, up until the 
decision about the salmon, people were mostly 
focusing on the crop side” 

AquaBounty Technologies, based in 
Maynard, Massachusetts, filed its first applica- 
tion to the FDA for approval of the salmon in 
1995. The agency completed its food-safety 
assessment in 2010, and released its environ- 
mental-impact statement at the end of 2012. 
The long delay between the completion of 
those steps and a final decision led to rumours 
of political interference. 

But Laura Epstein, a senior policy analyst 
for the FDA’s Center for Veterinary Medicine, 
says that the approval took so long because 
it was the first of its kind. “With most prod- 
ucts that are the first of its kind, we are very 
careful? she says. The agency also had to 
wade through many public comments before 


it could issue a decision, she adds. 

It is unclear how the salmon will fare on 
the market. AquAdvantage fish produce extra 
growth hormone, allowing them to grow to 
market size in 18 months, rather than the 
usual 3 years. In the time since AquaBounty 
first filed for approval, fisheries have bred con- 
ventional salmon that grow just as fast, says 
Scott Fahrenkrug, chief executive of Recom- 
binetics, an animal- 


biotechnology firm “It opens up the 
in St Paul, Minnesota. possibility of 
Then there isthe harnessing this 


matter of consumer 
acceptance: several 
grocery chains have said that they will not 
carry the salmon, which, even at full produc- 
tion, would amount to only a tiny fraction of 
total US salmon imports. “It’s a drop in the 
bucket,” says Jaffe. “Consumers would have to 
hunt to find salmon that are genetically engi- 
neered, as opposed to avoiding them” 

Still, the FDA’s approval met with swift 
opposition from some environmental and 
food-safety groups. Although AquaBounty 
uses physical and biological safeguards to 
reduce the chance that its salmon will escape 
into the wild, opponents fear that an acciden- 
tal release could alter natural ecosystems. They 


technology.” 


are also unhappy that the FDA will allow the 
fish to be sold without any label to indicate that 
it is genetically engineered. 

“Huge numbers of people have said, “Yes, we 
want it labelled,” says Jaydee Hanson, a senior 
policy analyst at the Center for Food Safety, 
an environmental-advocacy group in Wash- 
ington DC. “If this is such a good product, the 
company itself should be saying it will label it” 

The FDA declined to comment on whether 
other applications for genetically engineered 
animals are in the regulatory pipeline. But 
Fahrenkrug says that his company is develop- 
ing several such animals, including cattle that 
do not have to be dehorned and pigs that do 
not need to be castrated. 

Recombinetics’ animals are engineered 
using genome-editing techniques that Fahren- 
krug argues do not require FDA approval. The 
agency regulates animals that are engineered 
using a “recombinant DNA construct’, but 
his animals are modified by injecting protein 
and RNA into embryos. “It’s a treatment, not a 
transgene, he says. 

The FDA has yet to announce howit will view 
such animals, but Fahrenkrug takes approval of 
the salmon as a sign that the agency is willing 
to allow them onto the market. “I’m feeling 
optimistic now,’ he says. = 


PARIS CLIMATE TALKS 


Pledges raise hopes 
ahead of climate talks 


Momentum builds for a new treaty as world leaders prepare 


to descend on Paris. 


BY JEFF TOLLEFSON 


he road to a new global climate treaty 
Ths been slow and plodding. But years 

of delicate negotiations have given way 
to cautious optimism as more than 190 nations 
prepare for the marathon climate talks that 
begin in Paris on 30 November. 

Some long-running disputes remain, such as 
the debate about what cuts in greenhouse-gas 
emissions can be expected of developing nations 
compared with their developed counterparts. 
But there are many signs that the summit, 
convened by the United Nations, will succeed 
in crafting a global climate agreement. These 
include significant commitments by several 
major players, including the United States and 
China, to reduce emissions of greenhouse gases. 

“We are in for some tense negotiations, but 
I think well come out of the other end with 


an agreement,” says Saleemul Hug, director of 
the International Centre for Climate Change 
and Development in Dhaka, Bangladesh, 
and adviser to a negotiating bloc of the least- 
developed countries. 

And although Paris is still reeling from the 
deadly terror attacks of 13 November, which 
led the authorities to increase security for the 
meeting and cancel a big climate march, more 
than 130 heads of government and state are still 
expected to attend the two-week summit. 

The last major push for a climate treaty 
faltered in Copenhagen six years ago over 
whether developing countries should be 
asked to match developed countries and make 
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voluntary commitments to reduce emissions. 
The political situation has evolved since then 
and more than 165 countries have submitted 
pledges to combat climate change. Although 
these pledges would not cut greenhouse-gas 
emissions enough to meet the UN goal of limit- 
ing global warming to 2 °C above pre-industrial 
levels, they show a level of commitment that 
was missing in Copenhagen. 

“Countries are bringing more political will 
than ever before, and so we'll see if the process 
can deliver,’ says Elliot Diringer, executive 
vice-president of the Center for Climate and 
Energy Solutions, an environmental think tank 
in Arlington, Virginia. “This agreement has 
the potential to be a significant turning point” 

Despite a lingering — and potentially 
volatile — debate about whether those com- 
mitments will be legally binding under inter- 
national law, they are expected to remain 
voluntary. One of the biggest obstacles to a bind- 
ing agreement is the US Senate. On 17 Novem- 
ber, Republican senators pushed through 
legislation seeking to block regulations to limit 
greenhouse-gas emissions from power plants. 
US President Barack Obama can veto these bills, 
but he cannot force the Senate, which has the 
power to reject or approve treaties, to endorse a 
climate agreement that includes binding limits 
on greenhouse-gas emissions. 

As aresult, much of the debate will centre 
on creating mechanisms that allow govern- 
ments — and civil society — to monitor pro- 
gress, build trust and ensure accountability. 


Environmentalists and many governments 
are pushing for a five-year review period that 
would begin immediately after the Paris talks 
end; governments would need to return to the 
table with new commitments in 2020. 

Hug says that this exercise is particularly 
important for poor and vulnerable countries, 
which are pushing for a long-term goal of lim- 
iting warming to 1.5°C. The world is likely to 
cross a landmark threshold, the 1°C mark, 
for the first time in 2015, and Hug admits 
that stabilizing at 1.5°C would require emis- 
sions reductions so drastic as to be politically 


impossible at this point. But world leaders 
should acknowledge that even 2°C of warming 
comes with significant impacts on the world’s 
poorest citizens, he says. “We know we are not 
going to get everything we want in Paris, but 
it’s symbolic” 

Samantha Smith, leader of environmental 
group the WWF's Global Climate and Energy 
Initiative in Oslo, says that the biggest debate 
in Paris will be over financial aid to help poor 
countries to reduce their emissions and cope 
with the impacts of climate change. In 2010, 
wealthy nations established a Green Climate 
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Fund and committed to increase climate aid to 
US$100 billion annually by 2020. Developing 
countries will be looking for details about that 
commitment and what comes next. 

The good news, Smith says, is that the 
conversation about climate action has changed, 
not just within the negotiations but among 
faith groups, the general public and businesses, 
many of which will make their own voluntary 
emissions commitments in Paris. But she cau- 
tions that a new global treaty is just a first step. 
“When we walk out of there, we are still going 
to have a lot of work to do” m 


ESPEN RASMUSSEN/PANOS 


ENVIRONMENT 


Green Climate Fund 
faces slew of criticism 


First tranche of aid projects prompts concern over operations of fund for developing nations. 


BY SANJAY KUMAR 


ajor questions are swirling around 
Me: operations of a United Nations 

fund that is supposed to channel 
billions of dollars to help developing nations 
adapt to climate change and slow its pace. 

The Green Climate Fund (GCF) was estab- 
lished at UN talks in Canctin, Mexico, five 
years ago, and developing nations see it as one 
of their prime hopes for financial assistance in 
tackling a warming world. 

Yet the fund, which is administered by a 
small team in Incheon, South Korea, is strug- 
gling to raise cash from rich nations. And 
although it approved its first aid commitments 
on 6 November at a meeting in Livingstone, 
Zambia, observers say they are concerned that 
the GCF has cut corners so as to announce 
handouts before international climate talks in 
Paris in December. 

“We are worried about the fund’s social 
and environmental safeguards, consultation 
processes, accountability mechanisms and 
transparency,’ says Brandon Wu, a policy ana- 
lyst who focuses on climate finance at the non- 
governmental organization (NGO) ActionAid 
in Washington DC and who attended the Zam- 
bia meeting. 

The Cancun agreement recommended 
that climate aid total US$100 billion a year 
by 2020, but the balance between private and 
public money, and how much of it would flow 
through the GCE, has not been made clear. 

In the world of climate finance, the GCF 
is a tiny player. If funding for renewable 
energy and energy-efficiency programmes 


Flood barriers in Bangladesh could find support from a United Nations climate fund. 


is included, hundreds of billions of dollars 
already flow round the globe each year, says 
the Climate Policy Initiative (CPI), an interna- 
tional think tank. Still, the GCF is the largest 
international public climate fund. 

The fund’s initial target was to collect 
$10 billion before it started handing out cash, 
which it intends to divide equally between mit- 
igation and adaptation projects. By October, it 
had received pledges of $10.2 billion — which 
foreign-exchange rate variations have reduced 


to $9.1 billion. But only $5.83 billion had been 
formally agreed, and just $852 million had 
reached the fund’s pocket. The United States 
is the most significant missing name from the 
list of donor countries: last year it promised 
$3 billion, but it has yet to sign an agreement 
to contribute money. 

“At this pace we will not be able to do 
anything much,” says Dipak Dasgupta, an 
economist and India’s representative on the 
24-person GCF board. The proposals > 
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> approved in Zambia — $168 million for 
eight climate projects — are “small change’, 
he says. The approvals include a wetlands 
resilience programme in Peru, climate- 
resilient infrastructure in Bangladesh and a 
scheme of ‘green bonds’ to finance sustain- 
able energy ventures in Latin America and 
the Caribbean, but seven of the schemes will 
not receive money until they meet further 
project-specific conditions. 

Developed nations may be reluctant to 
transfer their money to the fund, says Tim- 
mons Roberts, who studies climate change 
and economic development at Brown Uni- 
versity in Providence, Rhode Island. “Many 
developing countries and NGOs believe 
that the funding should all flow through the 
GCF? he says. “However, contributor coun- 
tries have always defended their ability to 
funnel their funds through channels they 
control, whether through their own bilateral 
agencies (like USAID) or through dedicated 
World Bank funds.” 


LACK OF TRANSPARENCY 

There are also concerns about how the GCF is 
run, says Wu, who attended the Zambia meet- 
ing as a permitted ‘civil society observer. Wu 
is worried that indigenous communities were 
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not adequately consulted before the approval 
of $6.2 million for the Peruvian wetlands pro- 
gramme, for example. GCF documents say that 
a consultation was carried out, but for this and 
for other projects, the fund has no independ- 
ent verification of its claims, says Andrea Rod- 
riguez Osuna, who works in Mexico City for 
the non-profit environmental law organization 
AIDA and was also present in Zambia. 

Nor is the GCF transparent about its pro- 
cesses, Rodriguez Osuna adds. “The fund 
has no information disclosure policy and no 
accountability mechanism, yet the board is 
approving project proposals,” she says. 

For the eight projects approved at the board 
meeting, for example, only proposal docu- 
ments were publicly available (and in the case 
of two private-sector projects, only a sum- 
mary). “These are hardly the unbiased sources 
of information needed to evaluate a project's 
merits or any potential negative impacts,” 
Wu says. Project reviews made by the fund’s 
board and by an independent technical advi- 
sory panel are not publicly released, and GCF 


officials repeatedly failed to answer questions 
asked by Nature for this article. 

For some, another contentious issue is that 
the GCF is flowing its money mainly through 
international organizations, such as multilat- 
eral or private banks such as the World Bank 
and Deutsche Bank — rather than sending it 
directly to institutions in developing countries 
where the projects are taking place. 

The GCF is still new and is seriously under- 
staffed, Rodriguez Osuna adds; and observers 
hope that their worries are teething problems. 
Its executive director, Héla Cheikhrouhou, 
has promised “many more projects under 
development”. 

Claims have already been made that rich 
nations are upscaling public climate funding. 
But experts say that there is little clarity on 
whether the cash is new money, or being re- 
routed from elsewhere, such as from overseas 
development assistance funds. “Definitions 
of what constitutes new money haven't been 
agreed on,’ says Barbara Buchner, who leads 
CPT’ global finance programme in Venice, Italy. 

There is one thing is for certain, Buchner 
says — total finance for low-carbon energy pro- 
jects and for adapting to and mitigating climate 
change is far short of estimates of the need. “We 
need trillions, not billions,” she says. m 
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Brazilian courts tussle over 
unproven cancer treatment 


Patients demand access to compound despite lack of clinical testing. 


BY HEIDI LEDFORD 


court in the Brazilian state of Sao Paulo 
A cut off distribution of a compound 

that is hailed by some as a miracle 
cancer cure — even though it has never been 
formally tested in humans. 


On 11 November, to the relief of many 
cancer researchers, a state court overturned 


2 


MORE 
ONLINE 


earlier court orders that had obliged the nation’s 
largest university to provide the compound 
to hundreds of people with terminal cancer. 
Although the reversal applies only to requests 
for the drug by residents of Sao Paulo state, 
administrators at the university estimate that 
it covers about 80% of the orders they have 
received for the compound. 

The compound, phosphoethanolamine, 


has been shown to kill tumour cells only in 
lab dishes and in mice (A. K. Ferreira et al. 
Anticancer Res. 32, 95-104; 2012). Drugs that 
seem promising in lab and animal studies have 
a notoriously high failure rate in human trials. 
Despite this, some chemists at the University of 
Sao Paulo’s campus in Sao Carlos have manufac- 
tured the compound for years and distributed 
it to people with cancer. A few of those patients 
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Phosphoethanolamine capsules were manufactured at the University of Sao Paulo. 


have claimed remarkable recoveries, perpetuat- 
ing the compound’ reputation as a miracle cure. 

Dismayed by this unofficial distribution 
of phosphoethanolamine, the university’s 
administration moved in September 2015 to 
shut it down. Patients took the university to 
court, and in October 2015, Brazil’s Supreme 
Federal Court ruled in favour of one plaintiff 
who wanted the right to try the compound. A 
lower court then began granting orders for the 
university to provide it to others. University 
officials say that they were soon overwhelmed 
by more than 800 requests. 

“The decision not only ignored the opinion 
of medical specialists, but also overlooked the 
fact that the drug has only been tested on ani- 
mals,” says bioethicist Volnei Garrafa at the 
University of Brasilia. “Such court decisions 
bring false expectations for patients and their 
families, creating turmoil in society and con- 
fusion between what is safe and what is not.” 

The Brazilian constitution guarantees 
universal access to health care, and it is com- 
mon in Brazil for patients to turn to the courts 


to access drugs that the state health-care 
system does not dispense because of their cost, 
says Garrafa. But phosphoethanolamine pre- 
sents a different situation, he adds, because it 
is not really a drug at all. It is not approved by 
Brazil’s National Health Surveillance Agency. 
Those who argue that people who are ter- 
minally ill have a right to try experimental 
medicines saw the decision earlier this year 
as a significant victory. But to the university 
administration, drug regulators and cancer 
researchers, it showed blatant disregard for the 
basic scientific principle that a drug should be 
demonstrated to be safe and effective before 
being given to patients outside of a clinical trial. 
“Tt’s a violation of the autonomy of the uni- 
versity,’ says Marco Antonio Zago, a physician 
and president of the University of Sao Paulo. 
“We are seen as a factory to produce something 
that we do not believe should be done” 
Phosphoethanolamine is an important 
building block of the lipids that make up cell 
membranes. The compound can also act as a 
molecular signal that activates certain cellular 
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processes. Although some studies do suggest 
that the compound may kill cancer cells in 
isolated cells and mice, it is not entirely clear 
how the compound brings about this response. 
Biochemist Durvanei Augusto Maria at the 
Butantan Institute in Sao Paulo believes that 
the compound may be imported into tumour 
cells and, once inside, trigger processes that 
cause the cell to self-destruct. Immunologist 
James Venturini at Sao Paulo State Univer- 
sity and his colleagues have found that phos- 
phoethanolamine may modulate the immune 
system’s response to cancer or affect cell divi- 
sion (M. S. P. de Arruda et al. Braz. Arch. Biol. 
Technol. 54, 1203-1210; 2011). 

But to justify using phosphoethanolamine 
in people, Venturini says, one would have to 
rigorously test it in a series of clinical trials 
using human volunteers. “I strongly believe 
that double-blind, randomized clinical stud- 
ies are necessary,’ he says. 

And even before such trials, further preclini- 
cal studies would have to be done, says Jailson 
Bittencourt de Andrade, secretary for research- 
and-development policy at Brazil’s science and 
technology ministry. The ministry plans to fund 
those studies, he says, and has already asked sev- 
eral research laboratories in the country to do 
the work. If those tests and subsequent clinical 
trials are successful, he says, the ministry will 
also fund the research needed to scale up phos- 
phoethanolamine production to the quantities 
and quality needed for an approved drug. 

That process will take years. In the 
meantime, lawyers representing people with 
cancer have vowed to appeal against the latest 
ruling. If those appeals succeed, de Andrade 
worries that people will not wait until all the 
tests are completed, and may even abandon 
conventional treatment in favour of phospho- 
ethanolamine. “Many patients have come 
forward and said they have tried the drug 
and it has worked for them,” he says. “So the 
other patients and their families — they want 
phosphoethanolamine now.’ mSEE EDITORIALP.410 


TIMEKEEPING 


Leap-second decision delayed 


Nations fail to agree on whether to scrap an adjustment that keeps official 
time in sync with Earth’s rotation. 


BY ELIZABETH GIBNEY 


leap second is gone in the blink of 
A= eye. But a decision on whether 
to ditch these occasional time inser- 
tions — which keep official time synced with 


Earth’s rotation — has been delayed for at 
least eight years. 


This month, the International Telecommuni- 
cation Union (ITU), which bears responsibility 
for defining official Coordinated Universal 
Time (UTC), was expected to reach a consensus. 
But representatives who discussed the issue at 
the World Radiocommunication Conference in 
Geneva, Switzerland, failed to agree on whether 
the leap second’s costs outweigh its benefits. 


Leap seconds, which occur once every few 
years, are necessary because Earths rotation is 
slowing in an unpredictable way. Without them, 
the time of day when the Sun is at the highest 
point in the sky would drift by about one min- 
ute over about 100 years. However, these extra 
seconds have to be programmed into electronic 
systems manually and can upset systems that > 
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> depend on accurate timings. 

Most countries, including China, the 
United States and large parts of Europe, 
favour scrapping the leap second and basing 
utc on the continuous tick ofatomic clocks. 

Official time would slowly move out of 
sync with Earth’s rotation, but — given that 
it would take thousands of years to accu- 
mulate a difference that is greater than the 
shifts already caused by daylight savings 
time — many argue that this would cause 
few problems. “We are already shifted by 
one hour in summer compared to winter 
time,” says Elisa Felicitas Arias, director of 
the Time Department at the International 
Bureau of Weights and Measures (BIPM) in 
Sevres, France, who wants to scrap the leap 
second. “Are we affected because of that?” 
A correction — perhaps a leap minute or 
hour — could be added once the drift is 
appreciable. 

A small number of countries however, 
including Russia and the United Kingdom, 
want to keep the leap second. Russia is 
concerned about how its global navigation 
system, GLONASS — the only one to incor- 
porate leap seconds — would cope, says 
Vincent Meens of France's National Centre 
for Space Studies, and the chair of the ITU 
subgroup tasked with debating the topic. 
Britain’s argument is based largely on the 
desire to keep a link between official time 
and Earth's rotation, says Peter Whibberley, 
a metrologist at the National Physical Labo- 
ratory in Teddington, UK. 

Astronomers are among those who 
would be affected if the leap second were to 
be scrapped. Their software would need to 
cope with Earth's rotational time — which 
defines when stars and galaxies are seen 
in the sky — being offset by more than a 
second from universal time, says Meens. 

On 18 November, the ITU announced 
that it would defer a decision until 2023 
when it will have more information on the 
impacts of losing the second. 

The union did, however, decide to make 
changes to the international treaty that cur- 
rently defines utc, and in turn the leap 
second. Rather than having a stand-alone 
definition of uTC, the treaty will cite an SI 
definition, and mention of the leap second 
will move to become part of a ‘description 
of urc ina subsidiary section of the treaty 
that expires in 2023. 

Whibberley says that the effect will be 
to remove responsibility for utc from the 
ITU, and that the General Conference on 
Weights and Measures (CGPM) — which 
is already responsible for defining SI units, 
including the second — is most likely to 
become the authority in the future. But 
the change is unlikely to speed up the deci- 
sion on whether to scrap the leap second: 
the CGPM’s next chance to even propose a 
change is not until 2018. = 
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Decades of studies on chimpanzee brains and behaviour will be captured in an online resource. 


BIOMEDICAL RESEARCH 


Chimps retire to 
a digital world 


NIH to fund a cache of brain tissue and online data in 
place of live-animal experimentation. 


BY SARA REARDON 


anzee the chimpanzee was a skilled 
Pp communicator that could tell untrained 
humans where to find hidden food by 
using gestures and vocalizations. Austin the 
chimp was particularly adept with a computer, 
and scientists have been scanning its genome 
for clues to its unusual cognitive abilities. 
Both apes lived at a language-research centre 
at Georgia State University in Atlanta, and both 
died several years ago — but they will live on in 
an online database of brain scans and behay- 
ioural data from nearly 250 chimpanzees. 
Researchers hope to combine this trove, now 
in development, with a biobank of chimpan- 
zee brains to enable scientists anywhere in the 
world to study the animals’ neurobiology. 


R 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


This push to repurpose old data is espe- 
cially timely now that the US National Insti- 
tutes of Health (NIH) has decided to retire its 
remaining research chimpanzees. The agency 
decommissioned more than 300 animals in 
2013, but kept 50 available for research in case 
of a public-health emergency. Following an 
18 November decision, this remaining popula- 
tion will also be sent to sanctuaries in the com- 
ing years. The NIH also hopes to retire another 
82 chimps that it supports but does not own, 
says director Francis Collins. 

“We were on a trajectory toward zero, and 
today’s the day we're at zero,’ says Jeffrey Kahn, 
a bioethicist at Johns Hopkins University in 
Baltimore, Maryland, who led a 2011 study 
on the NIH chimp colony for the Institute of 
Medicine. 


VINCENT J. MUSI/NATL GEOGRAPHIC CREATIVE 


The NIH’s latest move, along with a decision 
in June by the US Fish and Wildlife Service 
to give research chimps endangered-species 
protections, effectively ends the possibility 
of biomedical research on the animals in the 
United States. 

The retirement of the NIH chimps will also 
end non-invasive studies on the 139 NIH- 
owned animals at the University of Texas MD 
Anderson Cancer Center primate facility 
in Bastrop. Its director, Christian Abee, says 
that researchers have published more than 
50 behavioural studies since 2012 using these 
animals. “There is no other alternative for 
cognitive research in chimpanzees,’ he says. 

That makes the NIH-funded chimp 
database all the more important. “This is a 
very unique window of opportunity to make 
sure that there's a legacy and a contribution 
from the lives they have lived,” says project 
leader Chet Sherwood, a biological anthro- 
pologist at George Washington University in 
Washington DC. 


ONLINE LEGACY 

In the next few months, Sherwood’s team 
plans to launch a website with a database for 
researchers and an educational component 
for the public. The site will eventually include 
existing data on the chimps’ performance in 
behaviour and personality tests, scans of the 


primates’ brain structure and activity, and 
their pedigrees and some genetic information. 
Sherwood and his colleagues plan to model the 
website on that of the Human Connectome 
Project, an open-access collection of brain 
scans from 1,200 individuals that researchers 
can use to study the links between brain struc- 
ture and activity and human traits. 
The team is also 


collaborating with “Thisis aunique 
the Allen Brain Insti- window of 

tute in Seattle, Wash- opportunity 
ington, tocreatean [0 make sure 
atlas of gene expres- that there’sa 
sion in the chimp legacy anda 
brain. Researchers contribution 
who want to study from the lives 
chimp brains in more they have lived.” 
detail can request tis- 


sue and blood samples from the team, which 
has nearly 250 preserved organs stored at facil- 
ities in Washington DC and Atlanta. 

But some scientists and advocates worry 
about the consequences of losing access to 
research chimps. Frankie Trull, director of 
the Foundation for Biomedical Research in 
Washington DC, which advocates for animal 
research, says that the US government may 
regret its decision if a public-health threat 
emerges that would be best studied in chim- 
panzees. Others caution that the dwindling 
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number of research animals will make it dif- 
ficult to develop therapies — such as vaccines 
against Ebola — for wild chimps, which would 
help both the animals and human beings. 

In the meantime, the NIH is struggling to 
find homes for its newly retired chimps. By law, 
retired animals are sent to a federal sanctuary 
known as Chimp Haven in Keithville, Louisi- 
ana, but that facility has only 25 places available 
now. Nearly 310 NIH-owned animals need to 
be resettled, and Collins says that the agency 
is still evaluating its options — a situation that 
worries lawmakers. 

On 20 November, two members of Con- 
gress sent the NIH a letter asking the agency 
for its plan to rehome the remaining chimps. 
“We want to make sure that for the sake of 
taxpayers and these much-abused chimpan- 
zees, these delays are overcome immediately,’ 
they wrote. 

Although retired, the apes of Chimp Haven 
may one day re-enter research labs — posthu- 
mously. Sherwood’s team is drafting an agree- 
ment with the sanctuary to obtain the animals’ 
brains when they die; it also hopes to acquire 
organs from chimps in zoos and research 
facilities. “You can imagine 20 years from now, 
this ageing population won't be here,” he says. 
“If we weren't making the efforts today, there 
wouldn't be a way to study neurobiology in 
chimpanzees.” = 
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ALL TOGETHER NOW 


After 25 years of negotiations, all countries are finally set to take steps 
to limit global warming. A special issue examines the path to the 
Paris climate summit, and the road beyond. 


ILLUSTRATION BY DAVID PARKINS 


hen more than 190 nations gather in Paris on 
W 30 November to broker an agreement to mitigate cli- 
mate change, it will be a turning point for the planet. 
A successor to the 1997 Kyoto Protocol has been a long 
time coming. A previous attempt to shape a global agreement 
fell apart in 2009, at talks in Copenhagen. Now the world is 
ready to try again, and for the first time, all countries are 
poised to take action (see page 418). But the history here is 
sobering: the quest to build a global climate treaty has hit 
many obstacles over the past 25 years. Its dramatic story is 
chronicled in a comic starting on page 427. 
Although the United Nations aims to limit global warming 
to 2 °C, a News Feature on page 436 reveals that this will be 
much harder than many studies have 


David Victor and James Leape (see page 439). But Johan 
Rockstrém, director of the Stockholm Resilience Centre, 
argues on page 411 that Paris will be a success if it shows that 
the world is serious about addressing the climate problem. 
To explore the backstory to the talks, historian Adam Rome 
reviews seminal books on sustainability from the 1960s and 
1970s (see page 443). A News story explores the challenges 
facing the Green Climate Fund, a UN mechanism to help 
developing nations adapt to climate change (see page 419). 
Online, Nature presents videos about the climate summit, 
as well as other unique material, at www.nature.com/ 
parisclimate. We will also cover the talks as they happen. 
Any agreement reached in Paris will not solve the climate 
problem, but it could lay a solid 


indicated. Part of the difficulty will j 
be ensuring that any treaty leads to | 
actions with lasting global momen- 
tum, say climate-policy experts 


rN 
rr 


PARIS CLIMATE TALKS 


A Nature special issue 
nature.com/parisclimate 


foundation for collective action 
(see page 409). The quest to save 
the planet will continue for many 
more years. = 
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Can nations unite to save 
Earth’s climate? 


A COMIC BY 
RICHARD MONASTERSKY 
AND NICK SOUSANIS 


PARIS CLIMATE TALKS 


A A Nature special issue 
\ | nature.com/parisclimate 
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ie WHEN THE WORLD’S NATIONS GATHER IN PARIS THIS 

é DECEMBER TO NEGOTIATE A CLIMATE TREATY, THEIR 
EFFORTS WILL CAP A 25-YEAR-LONG JOURNEY 
PLAGUED BY DETOURS AND DEAD ENDS. 


\y YY 


THE QUEST STARTED IN 1990 WHEN 
THE UNITED NATIONS LAUNCHED 
TALKS AIMED AT PRODUCING THE 
FIRST GLOBAL CLIMATE AGREEMENT. 


‘4 ) A, 


NATIONS GATHERED FOR THE EARTH 
SUMMIT IN RIO DE JANEIRO, BRAZIL. 


TUNTTIED RATIONS 


TRANEWURK GUNVER TION CL. 


1H.U) @- Pins i WHICH 

[H i | | |! | IN RIO, THEY ADOPTED THE UNITED l INI \ | PECLEREP: 

Ww ay ELWOTe catica eee Qs P The ultimate objective of this Convention 

7 ... iS to achieve ... stabilization of 
greenhouse gas concentrations in the 
atmosphere at a level that would prevent 
dangerous anthropogenic interference 
with the climate system. 
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THE RIO CONVENTION WAS A HISTORIC STEP, 
BUT IT CONTAINED NO BINDING 
COMMITMENTS TO SLOW GLOBAL WARMING. 


Lo 
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Carbon (billion tonnes) 


MAURICE STRONG, 
ORGANIZER OF 
THE RIO SUMMIT 
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8 WHAT IS THE GREENHOUSE EFFECT? SOME OF THAT ENERGY GREENHOUSE EFFECT, 
WATER VAPOUR, CARBON DIOXIDE AND Mae WARMS THE ATMOSPHERE. AVG TEMP -63 °C 
OTHER GASES IN THE ATMOSPHERE 
KEEP THE PLANET WARMER THAN IT 
WOULD OTHERWISE BE. 


EARTH: ENHANCED 
GREENHOUSE EFFECT, 
Aa, Ke, a AVG TEMP 15 °C AND 
GREENHOUSE GASES AS RISING 
ABSORB AND RE-EMIT 
INFRARED RADIATION. €: 
J 


\ . Se ee 
VENUS: EXTREME 
GREENHOUSE EFFECT, 
SOLAR RADIATION 


BY ADDING EXTRA CO., METHANE AVG TEMP 460 °C 
WARMS. EARTH'S AND OTHER POLLUTANTS, 


HUMANS ARE STRENGTHENING es 


} d 
WHICH RADIATES THE GREENHOUSE EFFECT. 
ai Nrearen enezey. 


IN 1896, THE SWEDISH SCIENTIST SVANTE ARRHENIUS 
CALCULATED HOW CHANGES IN THE AMOUNT OF CO, **| Global temperature trend 
IN THE ATMOSPHERE COULD WARM OR COOL EARTH. 

LONDON, EDINBURGH, axp DUBLIN 
PHILOSOPHICAL MAGAZINE 
AND 
JOURNAL OF SCLENCE. 
APRIL 1896, 


THE CHANGES CAME 
MUCH FASTER THAN , 
ARRHENIUS ANTICIPATED. |}: 


difference in temperature °C 
relative to 1951-80 average 


« On the Lijluence of Car! 
Temperate 


bonie Acid in the Air upy 
the Ground. _By Prof. Sva 


HE LATER SUGGESTED HUMANS WERE 
RAISING THE PLANET’S TEMPERATURE 
AND IT WOULD BECOME NOTICEABLE 

IN A FEW CENTURIES. 


ON 23 JUNE 1988, NASA 
SCIENTIST JAMES 
HANSEN TOLD A US 
SENATE HEARING THAT 
HUMANS WERE HAVING A 
CLEAR IMPACT BY 
BURNING FOSSIL FUELS 
SUCH AS COAL, OIL AND 
NATURAL GAS. 
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Average CO: 


concentration 
295 parts per 
million (p.p.m.) 


“THE GREENHOUSE 
EFFECT HAS BEEN 
DETECTED, AND IT 
IS CHANGING OUR 
CLIMATE NOW.” 


IT WAS A WAKE-UP 
CALL TO THE WORLD. 


SA] THE COUNTRY WAS 
SZ] SUFFERING ONE OF ITS 
Y) WORST DROUGHTS EVER, 
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S 
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ACTIVIST CHICO MENDES 
DREW ATTENTION TO THE 

RAMPANT DESTRUCTION OF 
THE AMAZON FOREST. 


FOSSIL FUELS ARE NOT THE ONLY 
CAUSE OF WARMING. DEFORESTATION 
ALSO CONTRIBUTES BY RELEASING 
THE COz STORED IN TREES. 


Landsat satellite images show =./ 
forest loss in Rondé6nia, Brazil. 7 ~~ 
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ALARMED BY THE GROWING : uo AT THE IPCC’S FIRST MEETING, THE DIRECTOR OF 
PROBLEM, THE UNITED NATIONS YG", ty : = THE UNITED NATIONS ENVIRONMENT PROGRAMME, 
CREATED THE INTERGOVERNMENTAL \ Zh MOSTAFA TOLBA, IMPLORED SCIENTISTS TO USE 
PANEL ON CLIMATE CHANGE (IPCC) P ¥ Z THE TIME LEFT IN THE CENTURY — JUST 4,000 

IN 1988 TO ASSESS THE ISSUE. yl) Te ; DAYS - TO DEAL WITH CLIMATE CHANGE. 


HOPES RE HIGH BECAUS'! 
Oe Wee Gee eres . IN THE CASE OF GLOBAL WARMING, 


TAKEN STEPS TO SOLVE ON EVERYONE HAS A HAND IN THE PROBLEM 
cpap eld allly ore e BECAUSE SO MANY ACTIVITIES GENERATE 
GREENHOUSE GASES. 


IN 1987, NATIONS ADOPTED A TREATY 
TO PROTECT THE OZONE LAYER. 


BUT REACHING THAT AGREEMENT 
WAS RELATIVELY EASY BECAUSE / ; RS 

ONLY A HANDFUL OF COMPANIES IN exe unica ieeineriieg 
A FEW COUNTRIES PRODUCED 

OZONE-DESTROYING COMPOUNDS. —F cdo IS INFINITELY MORE 


IN ITS FIRST REPORT, THE IPCC 
FORECASTED THAT IF CURRENT 
TRENDS CONTINUE UNTIL 2100, 
THE WORLD WOULD BE 4°C 
WARMER THAN IT WAS IN 1850. 
SWELLING OCEANS WOULD BE 
A MAJOR PROBLEM ... 


Se) bal 
IPCC 1990 projected \ ; 3% ‘a . BECAUSE HALF OF HUMANITY 
sea-level rise: INHABITS COASTAL REGIONS. 
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aS A MONSTROUS CYCLONE DROVE THAT POINT 
$l HOME IN 1991 WHEN IT KILLED MORE THAN 
140,000 PEOPLE IN BANGLADESH. - 


THE RIO TREATY WAS CLEARLY NOT 
ENOUGH, SO NATIONS GATHERED 
IN 1995 IN BERLIN TO NEGOTIATE 
A STRONGER ACCORD. 


2 Y HAD TO ACT FIRST BECAUSE THEY 
ae, Lf" W\\74 HAD CAUSED THE PROBLEM 
BUT THE ASSEMBLED COUNTRIES \ 
COULDN'T AGREE ON SPECIFICS. 
La 


rr - > r i y Mj) DURING ALL-NIGHT NEGOTIATIONS, | 
(a Z MB ype\ > Y) GERMANY’S ENVIRONMENT MINISTER, 
MOST OF THE PROBLEM - TO le ANGELA MERKEL, BROKERED A 
4 CUT EMISSIONS BY 20%, \ DEAL. COUNTRIES WOULD HAVE TWO 
YEARS TO AGREE ON EMISSIONS 
LIMITS FOR DEVELOPED NATIONS. 
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IN DECEMBER 1997, COUNTRIES GATHERED IN 
KYOTO, JAPAN, TO HASH OUT A NEW TREATY. BUT 


THEY COULDN’T AGREE ON HOW MUCH DEVELOPED 
NATIONS SHOULD TRIM THEIR EMISSIONS. 


AFTER WORKING THROUGH THE 
FINAL NIGHT, NEGOTIATORS 
REACHED AN AGREEMENT 


_—— 


{ a 4 % 
LAND = 
16 NATIONS De, dD Ws 


i CALLED THE KYOTO PROTOCOL. 
= NOs. RA 15% cy | F IT WAS THE FIRST TIME THAT 
A 20% f : i 4 i COUNTRIES PROMISED TO REIN 
aor oeee curt j i Aj IN GREENHOUSE-GAS POLLUTION 
pPAN PF A 5% at i ( ii BY SPECIFIC AMOUNTS. ) 
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ave US WANTED DEVELOPING Countries TO ACT: TOO” 


THE KYOTO PROTOCOL 
SPLIT THE WORLD IN 
TWO: INDUSTRIALIZED 
COUNTRIES WITH 
EMISSIONS LIMITS ... 


THE PROTOCOL ALSO ALLOWED FOR 
FLEXIBILITY IN HOW COUNTRIES MET 

THEIR COMMITMENTS. DEVELOPED 
NATIONS COULD GET CREDIT FOR 
REDUCING EMISSIONS IN POORER ONES. 


[ SE 


.-- AND DEVELOPING 
COUNTRIES WITHOUT. 


DEVELOPED COUNTRIES PROMISED TO CUT THEIR OVERALL 
EMISSIONS TO 5.2% BELOW 1990 LEVELS FOR THE 
PERIOD 2008-12. EACH COUNTRY HAD ITS OWN TARGET. 


Iceland +10% 


THE US REFUSED TO RATIFY THE 
PACT BECAUSE OF CONCERNS THAT 
ITS ECONOMY WOULD SUFFER WHILE 
DEVELOPING NATIONS INCREASED 

THEIR POLLUTION WITHOUT LIMITS. 


Australia +8% 
Norway +1% 


Russian Federation 0% 


Canada -6% 
Japan -6% 


BUT THE CRACKS \W J 
IN THE TREATY a 1a SS 
sfrcrecin’ tier ie WERE CLEAR iZ , IN 2001, US PRESIDENT GEORGE W. BUSH 
p 6 FROM THE START. Wi, REJECTED THE AGREEMENT, SAYING “THE 
KYOTO PROTOCOL WAS FATALLY FLAWED 
IN FUNDAMENTAL WAYS.” 


US -7% 


Cumulative'-5.2% 
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SOON, WORLD EVENTS MADE CLEAR HOW Planet (S Baily 

LIMITED THE PROTOCOL WAS. IN 2006, 


CHINA PASSED THE US TO BECOME THE 
WORLD’S LARGEST CARBON EMITTER. 


Planet (B Baily 


2010 HOTTEST YEAR ON RECORD 


CANADA FORMALLY WITHDREW 
FROM KYOTO IN 2011. 
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THROUGHOUT THE CLIMATE 
NEGOTIATIONS, SCIENTISTS HAVE 
TRIED TO SHOW WHAT KIND OF 
WORLD AWAITS FUTURE 
GENERATIONS IF GLOBAL 
WARMING CONTINUES. 


SUCH FORECASTS COME FROM 
COMPLEX CLIMATE SYSTEM 
MODELS, WHICH DIVIDE THE 
GLOBE INTO MILLIONS OF CELLS. ... 


| / 


i cl } / | pss 
/ / ITA LT | _k aa 
... AND SIMULATE THE |// ” 
ATMOSPHERE, OCEANS | (// 
| AND BIOSPHERE IN 3D. 
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RESEARCHERS HAVE CONFIDENCE 
IN THEIR MODELS BECAUSE THEY 


CAN REPRODUCE FEATURES OF 
PAST AND CURRENT CLIMATES. 


Model 
Observations 


op er yg Projected temperature change in 2100 


for a mean global warming of 4°C 
a SE Ss 


Temperature anomalies (°C) 


= ALTHOUGH SCIENTISTS AGREED HUMANS WERE WARMING 
< THE PLANET, SOME POLITICIANS DENIED THAT FACT. 
J [BONA 


“WITH ALL OF THE HYSTERIA, ALL OF THE FEAR, F “WE SHOULD INVESTIGATE THE WELL-FUNDED 
ALL OF THE PHONY SCIENCE, COULD IT BE THAT EFFORT BY CERTAIN OIL COMPANIES TO 

. MAN-MADE GLOBAL WARMING IS THE GREATEST rf 

> HOAX EVER PERPETRATED ON THE AMERICAN Ly ON THE REALITY OF GLOBAL WARMING.” 

jy — US REPRESENTATIVE HENRY WAXMAN, 2006 
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IN 2003, EUROPE 
SUFFERED A 4,000 
PROLONGED HEAT 
WAVE THAT KILLED 
AN ESTIMATED 
70,000 PEOPLE. 


THE IMPACTS WERE GETTING 
CLEARER. THE IPCC DECLARED IN 
2007: “WARMING OF THE CLIMATE 
SYSTEM IS UNEQUIVOCAL.” 

2,000 


Excess deaths per day 


LATER THAT YEAR, THE IPCC 
WAS AWARDED THE NOBEL 
PEACE PRIZE FOR ITS EFFORTS. 


31 July 


WHILE THE SCIENCE RACED f y FOR THE FIRST TIME, DEVELOPING 
AHEAD, THE NEGOTIATIONS / NATIONS AGREED TO “MITIGATION BN 
DRAGGED ON. NATIONS MET 1, | \ ACTIONS” OF THEIR OWN CHOOSING |W 
IN BALI IN DECEMBER 2007. : y / TO LIMIT CLIMATE CHANGE. 
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THE TALKS WERE SO FRACTIOUS THAT 
=] AT ONE POINT, THE CHAIRMAN BROKE 


\ SET FOR A TREATY IN 2009 THAT == Y 
WOULD INCLUDE NEW COMMITMENTS De ene nie OCLE 
BY DEVELOPED COUNTRIES. ADDRESS DEFORESTATION. 
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IN THE RUN-UP TO THE 2009 
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SOME DEMANDED 
REDUCING CO2 LEVELS TO 
50 P.P.M., WHICH WOULD 
Cir FUTURE W WARMING. 
AND LESSEN THE RISKS OF 
DANGEROUS IMPACTS 


EXTREME SEA-LEVEL RISE 
AND MEGA-DROUGHTS. 


DESPITE THE FRENZY OF ATTENTION, 
THE COPENHAGEN NEGOT! rs IONS 
FAILED TO DELIVER A TREATY. 


WESTERN NATIONS BLAMED 
CHINA FOR BLOCKING 
SUBST, ANTIVE EMISSIONS LIMITS. 
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The2°C dream 


s have pledged to limit global warming to 2°C, and climate models 
at is still possible. But only with heroic — and unlikely — efforts. 


BY JEFFTOLLEFSON 


gathered for the historic climate summit in Paris at the end of 2015. Nearly 

8.8 billion people now crowd the planet. Energy consumption has nearly doubled, 
and economic production has increased more than sevenfold. Vast disparities in wealth 
remain, but governments have achieved one crucial goal: limiting global warming to 
2°C above pre-industrial temperatures. 

The United Nations meeting in Paris proved to be a turning point. After forging a 
climate treaty, governments immediately moved to halt tropical deforestation and to 
expand forests around the globe. By 2020, plants and soils were stockpiling more than 
17 billion tonnes of extra carbon dioxide each year, offsetting 50% of global CO, emis- 
sions. Several million wind turbines were installed, and thousands of nuclear power 
plants were built. The solar industry ballooned, overtaking coal as a source of energy in 
the waning years of the twenty-first century. 

But it took more than this. Governments had to drive emissions into negative terri- 
tory — essentially sucking greenhouse gases from the skies — by vastly increasing 
the use of bioenergy, capturing the CO, generated and then pumping it underground 
on truly massive scales. These efforts 
pulled Earth back from the brink. Atmos- 
pheric CO, concentrations peaked in 2060, PARIS CLIMATE TALKS 
below the target of 450 parts per million A Nature special issue 
(p.p.m.) and continue to fall. 


T he year is 2100 and the world looks nothing like it did when global leaders 
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hat scenario for conquering global warming is one 
T possible — if optimistic — vision of the future. It was 

developed by modellers at the Joint Global Change 

Research Institute in College Park, Maryland, as part ofa 

broad effort by climate scientists to chart possible paths 
for limiting global warming to 2°C, a target enshrined in the UN climate 
convention that will produce the Paris treaty. 

Climate modellers have developed dozens of rosy 2°C scenarios over 
several years, and these fed into the latest assessment by the Intergov- 
ernmental Panel on Climate Change (IPCC). The panel seeks to be 
policy-neutral and has never formally endorsed the 2-degree target, 
but its official message, delivered in April 2014, was clear: the goal is 
ambitious but achievable. 

This work has fuelled hope among policymakers and environ- 
mentalists, and it will provide a foundation for debate as governments 
negotiate a new climate agreement at the UN's 2015 
Paris Climate Conference starting on 30 Novem- 
ber. Despite broad agreement that the emissions- 
reduction commitments that countries have 
offered up so far are insufficient, policymakers 
continue to talk about bending the emissions curve 
downwards to remain on the path to 2 degrees that 
was laid out by the IPCC. 

But take a closer look, some scientists argue, 
and the 2°C scenarios that define that path seem 
so optimistic and detached from current political 
realities that they verge on the farcical. Although the caveats and uncer- 
tainties are all spelled out in the scientific literature, there is concern that 
the 2°C modelling effort has distorted the political debate by obscuring 
the scale of the challenge. In particular, some researchers have questioned 
the viability of large-scale bioenergy use with carbon capture and stor- 
age (CCS), on which many models now rely as a relatively cheap way to 
provide substantial negative emissions. The entire exercise has opened up 
arift in the scientific community, with some people raising ethical ques- 
tions about whether scientists are bending to the will of politicians and 
government funders who want to maintain 2 °C as a viable political target. 

“Nobody dares say it’s impossible,” says Oliver Geden, head of the Euro- 
pean Union Research Division at the German Institute for International 
and Security Affairs in Berlin. “Everybody is sort of underwriting 
the 2-degree cheque, but scientists have to think about the credibility 
of climate science.” 

Modellers are first to acknowledge the limits of their work, and say that 
the effort is designed to explore options, not predict the future. “We'll tell 
you how many nuclear power plants you need, or how much CCS, but we 
cant tell you whether society is going to be willing to do that or not,’ says 
Leon Clarke, a senior scientist and modeller at the Joint Global Change 
Research Institute. “That's a different question” 


ONE TRILLION TONNES 

The idea of limiting global warming to 2 °C dates back to 1975, when 
economist William Nordhaus of Yale University in New Haven, Con- 
necticut, proposed that more than 2 or 3 degrees of warming would push 
the planet outside the temperature range of the past several hundred thou- 
sand years. In 1996, the EU adopted that limit, and the Group of 8 (G8) 
nations signed on in 2009. The parties to the UN convention on climate 
change affirmed the target in 2009 at their Copenhagen summit, and then 
formally adopted it a year later in Cancun, Mexico. 

The move caught scientists off guard. Before 2009, most modellers 
had focused on scenarios in which atmospheric CO, concentrations 
stabilized around 550 p.p.m. — double the pre-industrial level — which 
would probably limit warming to a little less than 3 °C. But as political 
interest in the 2 °C target grew, a few started exploring the implica- 
tions. In April 2009, a team led by Myles Allen, a climate scientist at the 
University of Oxford, UK, published’ a study concluding that humans 
would have to limit their total cumulative carbon emissions to 1 trillion 
tonnes — more than half of which had already been dumped into the 


“It’s just simple 
arithmetic: the 
carbon budget is so 
small that you need 
to go negative.” 
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atmosphere — to maintain a chance of limiting warming to 2°C. This 
trillion-tonne carbon budget provided a scientific baseline for what was 
now a politically important target, and many modellers shifted gears. 

“There were very few scenarios with stringent targets such as 2°C, 
and then sponsors started demanding it,” says Massimo Tavoni, deputy 
coordinator of climate-change programmes at the Eni Enrico Mattei 
Foundation in Milan, Italy. 

The flurry of modelling efforts that followed split into two main camps: 
pay early or pay late (see “Iwo paths to 2 °C’). In the former, nations need 
to slash greenhouse-gas emissions immediately; in the latter, they can 
buy time for a slower phase-out by developing a massive infrastructure 
to suck CO, out of the air. 

“Models that have these negative emissions really do let you continue 
to party on now, because you have these options later,’ says John Reilly, 
co-director of the Joint Program on the Science and Policy of Global 
Change at the Massachusetts Institute of Technol- 
ogy (MIT) in Cambridge. 

In the pay-later approach, most models rely on 
a combination of bioenergy and CCS. The sys- 
tem starts with planting crops that are harvested 
and either processed to make biofuels or burnt 
to generate electricity, which provide carbon- 
neutral power because the plants absorb CO, as 
they grow. The CO, created when the plants are 
processed is captured and pumped underground, 
and the process as a whole eats up more emis- 
sions than it creates. A consortium sponsored by the US Department of 
Energy has tested such a system at one facility that produces bioethanol 
fuel in Illinois, but neither bioenergy nor CCS has been demonstrated 
on anywhere near the scales imagined by the models. 

“Tt’s just simple arithmetic: the carbon budget is so small that you 
need to go negative, or at least you need to offset some of your emissions 
in order to get to zero,’ says Tavoni. “We tried to be honest, and pretty 
agnostic about whether these transformations are easily achievable.” 

On the basis of those models and other information, the IPCC 
estimates that climate mitigation would reduce the projected global 
consumption in 2100 by 3-11% —a relatively modest amount that 
would allow the global economy to keep growing overall. But remove 
either bioenergy or CCS from the scenarios and the costs increase 
substantially. If mitigation is delayed or bioenergy and CCS are 
constrained, most models simply can’t limit warming to 2°C. 

The question is whether any of those models accurately reflect tech- 
nical and social challenges. MIT has a model that tends to project costs 
two or three times the average reported by the IPCC, in part because it 
tries to reflect difficulties in scaling up any technology, such as the avail- 
ability of skilled labour and natural resources in different regions. And 
then there are the technical hurdles. Capturing CO, from power plants 
has proved more difficult and expensive than many had hoped. Just one 
commercial project is currently operating, at the Boundary Dam Power 
Station in Saskatchewan, Canada. 

Moreover, Reilly says, the number of models that actually completed 
2°C scenarios remains relatively small, and they probably project lower 
mitigation costs than those that are not able to generate these low- 
emissions scenarios. “It's a very self-selecting set of models.” 

Although the caveats are listed in the IPCC assessment, the report does 
not adequately highlight economic and technical challenges or modelling 
uncertainties, says David Victor, a political scientist at the University of 
California, San Diego, who participated in the IPCC assessment. Victor 
does not place all the blame on scientists glossing over the problems: 
when researchers drafted the assessment's chapter on emissions scenarios 
and costs, he says, they included clear statements about the difficulty of 
achieving the 2°C goal. But the governments — led by the EU anda bloc 
of developing countries — pushed for a more optimistic assessment in 
the final IPCC report. “We got a lot of pushback, and the text basically 
got mangled,’ Victor says. 

For all of the concerns and criticisms, however, modellers say that the 
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Two paths to 2 °C 
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Modellers have explored various scenarios for limiting global warming to 2 °C. One (left) immediately slashes fossil-fuel use 
while ramping up renewable-energy use. Another strategy (right) allows continued use of fossil fuels, but bioenergy supplies 
a growing share of energy. Carbon from the bioenergy industry is captured and stored, driving overall emissions below zero. 
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exercises have illuminated important research questions, such as how 
much bioenergy and CCS will cost and what effects they will have on 
land use, food systems and water availability. 

One 2014 study” in Earth’s Future, for instance, found that it would be 
difficult to grow enough bioenergy crops, even with second-generation 
cellulosic biofuels, which are made not only from a plant’s sugars but 
also from the carbon in its stem and woody materials. The effort would 
require significant boosts in crop yields and the use of 77% more nitro- 
gen fertilizer by 2100. The bioenergy would also need to be produced 
in centralized facilities that capture the bulk of the emissions. Unless 
everything goes right, scaling up to the level projected in many models 
would be difficult without significantly reducing food production or 
clearing large swathes of natural ecosystems for farmland. 

“If we need to ramp up such a large infrastructure, we need to inves- 
tigate what that implies,” says Sabine Fuss, an environmental scientist 
at the Mercator Research Institute on Global Commons and Climate 
Change in Berlin. 

Fuss led a commentary” in Nature Climate Change in October 2014 call- 
ing for a transdisciplinary research agenda on negative emissions. One of 
the first outgrowths of that work, led by co-author Peter Smith, a biologist 
at the University of Aberdeen, UK, is an upcoming assessment of carbon- 
negative strategies and potential limitations. Strategies include bioenergy 
with CCS, as well as other ways of absorbing carbon, such as planting 
forests, using chemical scrubbers to capture CO, directly from the air and 
crushing rocks to enhance geological weathering that consumes the gas. 

“The science behind these technologies is probably a bit behind the 
models,” Smith says. “This sort of provides a road map for where we 
need to go in the next two or three years.” 


RISK FACTOR 

Modellers are also digging into real-world complexities. Most models 
assume that participation in climate mitigation will be global, that coun- 
tries will put a common price on carbon, that technological solutions 
will be widely available and that this combination will drive investment 
towards relatively cheap mitigation options in developing nations. But 
the reality could be more complicated. A team at the Joint Global Change 
Research Institute worked with Victor and others to investigate the risks 
of making investments in developing countries due to political instability 


438 | NATURE | VOL 527 | 26 NOVEMBER 2015 


and the relatively poor quality of many public institutions there. Their 
model showed’ that investors would probably shun developing countries 
and pour money into developed ones, driving up costs and making it 
harder to curb rapidly rising emissions in developing nations. 

“The models have taught us that with unrealistic assumptions any- 
thing is possible, and with realistic assumptions it will be very hard to cut 
emissions to meet goals like 2 degrees,” Victor says. “That’s an important 
result because it forces — or should force — some sobriety about what 
can be achieved” 

One message that modellers have delivered quite clearly is that with- 
out collective and aggressive action by all countries, costs invariably 
increase, and the chance of hitting the 2°C goal plummets. This is pre- 
cisely the situation heading into the Paris summit. Most countries, and 
all of the major greenhouse-gas emitters, have submitted pledges to 
reduce their emissions, but these vary widely in ambition. 

As it stands, the world is on a path to nearly 3°C of warming by the end 
of the century, and even that assumes substantial emissions reductions 
in the future. If nations do not go beyond their Paris pledges, the world 
could be on track to use up its 2°C carbon budget as early as 2032. If the 
models are correct, world leaders may have to either accept extra warming 
or plan for a Herculean negative-emissions campaign. In the event that 
they choose the latter — and succeed — the entire debate will change. 

“It’s a completely different game,” says Nebojsa Nakicenovic, an 
economic modeller and deputy director-general of the International 
Institute for Applied Systems Analysis in Laxenburg, Austria. “If that is 
technically possible, then we could also go below 2 degrees.” 

Fast-forward to 2100 once more. The bioenergy industry is now one 
of the largest and most powerful on Earth. People are pulling roughly 
as much CO, out of the atmosphere as they were emitting at the time of 
the historic Paris conference. Humanity has asserted control over the 
atmosphere, and governments face a new and difficult question at the 
108th anniversary of the UN climate convention: how low should they 
set the global thermostat? m 


Jeff Tollefson writes for Nature from New York. 
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After the talks 


The real business of decarbonization begins after an agreement is signed at 
the Paris climate conference, argue David G. Victor and James P. Leape. 


fter years of failure to craft global 
Azenen on climate change, the 
upcoming United Nations Paris 
Climate Conference is likely to turn a cor- 
ner. Diplomats have drafted a workable text 
that will probably be adopted. Businesses 
and environmental groups are engaged in 
the process in unprecedented ways. 
Governments, development banks and 
foundations are raising funds to help the 
poorest countries to pay for cutting emis- 
sions and prepare for a changing climate’ — 
the main sticking point in 2009, when the 
last big climate conference, in Copenhagen, 
ended in disarray. The UN and the French 
hosts have a sophisticated agenda to bring 
all these efforts together. Even religious lead- 
ers have spoken mightily of the dangers of 


unchecked climate change. 

Good news from the Paris meetings will 
build confidence, a crucial ingredient for 
effective international cooperation. Govern- 
ments and firms will invest in a future with 
lower emissions if they think that others will 
do the same’. Agreement will demonstrate 
the viability of a new, flexible ‘bottom-up’ 
mode for climate diplomacy — based on 
national pledges that accommodate different 
preferences and capabilities. By contrast, the 
rigid targets and timetables of the Kyoto Pro- 
tocol appealed to few of the world’s emitters. 
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Yet a dose of sobriety is also needed. 
Agreements are feasible now only because 
diplomats are postponing the thorniest prob- 
lems, such as how to hold nations account- 
able. Business engagement may prove 
ephemeral when the spotlight shifts. Good 
news about climate finance is possible now 
because the blend of public funding (which 
is hard to mobilize and spend effectively) and 
private money (which is abundant but often 
rarely focused on global goals) is vague. 

Whether the Paris conference will succeed 
depends on what unfolds afterwards. Diplo- 
mats will have much to do until 2020, when 
the main accords take full effect. Civil soci- 
ety — notably business — must shift from 
making bold promises to cutting emissions. 
Governments and business must build 
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> and invest in review and accountability 
mechanisms to ensure that they are keep- 
ing their promises — an area in which non- 
governmental organizations (NGOs) have 
a crucial role. And scientists must pursue 
research that is directly relevant to policy- 
making, as well as assessing the underlying 
causes and impacts of climate change. 


ENGAGE BUSINESS 

Keeping business on board will be the most 
important challenge. It is easy for companies 
to make commitments when the world’s 
media and political leaders are watching. It 
is harder to implement changes when cut- 
throat competition makes it risky to invest in 
more expensive but less polluting technolo- 
gies and practices. 

The most striking example of business 
engagement is the pledges that many firms 
and governments are making to cut defor- 
estation’. In 2010, the Consumer Goods 
Forum (comprising the largest retailers and 
consumer-products companies) announced 
that its members would eliminate deforesta- 
tion from their supply chains, notably for 
palm oil, soya, beef, timber and pulp. More 
than 300 companies have followed suit (see 
www.supply-change.org). Leading produc- 
ers and traders of palm oil in Indonesia — 
which accounts for half of the world’s supply 
— have promised to stop converting forest or 
peat lands’. Palm oil is a main culprit in the 
fires that have spread a choking haze across 
the region since August, afflicting more than 
40 million people and often causing daily 
emissions of greenhouse gases that surpass 
those of the United States. 

It is far from assured that these pledges 
will result in lasting changes in the complex 
supply chains — from how the land is man- 
aged, to the produced oil and finally to con- 
sumer products. There are already signs of 
trouble. Most businesses pledge to become 
more sustainable following pressure from 
NGOs’. (One of us, J.P.L., led WWE Interna- 
tional for nine years, during which time the 
organization was centrally involved in many 
such efforts.) Firms fear consumer backlash 
if their products are tied to environmental 
destruction’ (see go.nature.com/518yjm). 
After the Paris meetings, chief executives 
will need to activate changes through the 
ranks of their organizations and suppliers; 
NGOs will need both to maintain the pres- 
sure for action and to work with companies 
to secure broader reforms in major produc- 
ing countries. 

Shifting whole industries into more 
sustainable modes of production requires 
collaboration between government, business 
and civil society. Economic incentives must 
be rewired so that no firm can gain an advan- 
tage by, for example, continuing to destroy 
forest. Solutions will vary by country and 
locality, but common threads include better 
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governance — laws, fiscal regimes, property 
rights and public administration — and 
investment in helping countries, communi- 
ties and small producers to make the transi- 
tion to sustainability. 

Brazil has shown what is possible. Between 
1995 and 2005, forest loss in the Brazilian 
Amazon averaged 19,500 square kilometres 
per year — roughly the area of Israel’. By 
2013, that rate had been cut by 70%, even 
as beef and soya production continued 
to grow. A combination of measures was 
applied: corporate commitments coupled 
with strong laws, satellite surveillance and 
robust enforcement, restrictions on access 
to credit for farms and ranches in coun- 
ties with high deforestation, the creation of 
protected areas and 


indigenous reserves, “It is easy for 
and improvements companies 

in land tenure and fomake 
governance. Brazils commitments 
federal government whenthe 
worked closely with world’s media 
the beef and soya and political 
industries, NGOs  Jegders are 


and international 
partners. In 2008, for 
example, Norway committed US$1 billion 
to Brazil because it wanted to demonstrate 
practical new ways to protect forests globally. 
Even so, Brazil’s progress is fragile — defor- 
estation in the Amazon has increased over 
the past 18 months. 

Beyond forestry, industry’s commitment 
to reducing emissions is mixed. The three 
dozen firms and governments that account 
for 40% of the methane released from oil and 
gas production have pledged to eliminate 
those emissions by 2030, for example (see 
go.nature.com/beuw2z). Details on how this 
pledge will be monitored are scarce, as is a 
plan to extend the pledge to the rest of the 
global industry. 

Business is, mostly, still waiting to see 
whether the Paris conference will turn out 
to be a watershed. Governments are looking 
for signs that industry can cut emissions at 
acceptable cost and are sceptical that com- 
peting nations will take action. For all the 
good will in Paris, this chicken-or-egg prob- 
lem looms large — it explains why climate 
policy requires international cooperation, 
and why so little progress has been made 
over the past 25 years. Governments must 
grapple with huge unknowns about what 
mitigation will cost and whether other coun- 
tries will honour their commitments’. Until 
confidence in international cooperation 
grows, politicians and business leaders will 
talk big but deliver small’. 


watching.” 


NEW DIPLOMACY 

Optimism about Paris is partly rooted ina 
new bottom-up bargaining system whose 
flexibility, in theory, is suited to crafting policy 
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in areas in which cooperation is essential but 
countries are unsure about what is feasible”®. 
National pledges — in diplomatic jargon, 
‘intended nationally determined contribu- 
tions’ (INDCs) — allow governments to 
align their commitments with national pri- 
orities. This approach has elicited firm com- 
mitments — notably from countries such as 
the United States, China and India, which are 
skittish about inflexible international legal 
commitments yet willing to do their part for 
the global whole. China's pledges, for exam- 
ple, will help to slow global warming while 
serving the country’s pressing concerns about 
reducing air pollution and achieving energy 
security. 

Pledge systems also bring dangers. The 
current round of INDCs is thin on content; 
some countries have failed to supply any 
reports, and industry has been largely absent 
from the process. Unless the pledging system 
is improved, it could become a licence to do 
nothing. This is why earlier schemes have 
yielded little practical action — as with the 
Asia-Pacific Partnership on Clean Develop- 
ment and Climate created in 2005 by then- 
US President George W. Bush after the United 
States refused to ratify the Kyoto Protocol. 
Pledges must offer enough detail and trans- 
parency for diplomats to link national efforts 
into more-ambitious, collective agreements 
in the future. A priority after Paris will be to 
develop stricter standards for national pledges 
as well as robust systems for review. 


ROAD AHEAD 

Only so much can be achieved within the 
UN system — in which consensus is usually 
required and it is easy for reluctant nations 
to block progress. Countries and firms 
will need to find ways to work in smaller, 
focused and more practical groups — in 
tandem with the broader global objectives’. 
Doing this cannot rest on altruism — it 
requires attention to self-interest and, as the 
palm-oil example shows, putting pressure 
on governments and firms to rethink their 
self-interest. 

Countries that want this new flexible sys- 
tem to work should volunteer to do more 
— for example, by offering their INDCs for 
reform and review. The United States and 
China should offer their own bilateral climate 
accord, made in November last year — which 
pledged emissions curbs and efforts to con- 
duct joint research on new technologies — to 
independent scrutiny, such as by the Organi- 
sation for Economic Cooperation and Devel- 
opment or the World Bank. With a huge stake 
in showing the effectiveness of the pledging 
process, these two countries must bear the 
burden of proof*. 

Firms, too, must recognize that their 
efforts will be believed only with transpar- 
ency and public accountability. Failure to 
demonstrate that corporate pledges are 
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leading to tangible action will lead to 
demands after Paris for more onerous and 
costly regulation. Industry pledges should 
be reviewed alongside government com- 
mitments — and leading firms that have 
the most to gain from this new system of 
governance should invest in the needed 
independent reviews. NGOs have a key 
role in holding companies to account, 
assessing to what degree stated reductions 
are real (with no double counting) and 
identifying where extra effort is needed. 

For academics, this world of bottom-up 
diplomacy demands new skills. Periodic 
global assessments of the state of the sci- 
ence and gaps between what governments 
and firms pledge and what the planet needs 
for protection will still be needed. Equally 
urgent is interdisciplinary research pre- 
dicting how these messy, decentralized 
systems of governance will function. Scien- 
tists, including social scientists, will need to 
look, together, at how societies develop and 
implement policy reforms while assess- 
ing what works so that research is more 
informative for policymakers. 

Sceptics will see that messy reality on 
display at the Paris conference and declare 
that the event has failed to deliver on 
widely discussed goals such as stopping 
warming at 2°C above preindustrial levels. 
The better metric is whether Paris engages 
a growing share of industry and govern- 
ments in the climate task. When the meet- 
ings in Paris are done, the real business of 
decarbonization must begin. m 
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Corrosion costs around US$4 trillion a year globally. 


Share 
corrosion data 


To prevent disasters, Xiaogang Li and 
colleagues call for open data infrastructures to 
collate information on materials failures. 


the Chinese city of Qingdao exploded, 

killing 62 people and wounding 136. 
Eight months later, a similar explosion in 
Kaohsiung caused 32 deaths and 321 inju- 
ries. The pipelines were made of steel of the 
same specification and they failed after two 
decades of use in similar environments. The 
cause was corrosion — the degradation of 
a material by a chemical or electrochemical 
reaction with its environment. 

Such disasters are common: each square 
kilometre of any Chinese city hosts more 
than 30 kilometres of buried pipes, creating 
tangled networks of oil and gas lines, water 
mains and electrical and telecommunications 
cables. Corrosion is costly, too. According to a 
US survey, corrosion costs six cents for every 
dollar of gross domestic product in the United 
States’. Globally, that amounts to more than 
US$4 trillion a year — equivalent to damages 
from 40 Hurricane Katrinas. Half of that cost 
is in corrosion prevention and control, the 
other halfin damages and lost productivity. 


E November 2013, an oil pipeline in 


A lack of knowledge hinders our ability 
to prevent failures. Degradation of under- 
ground pipes, for example, is influenced 
by the compositions, microstructures and 
designs of materials, as well as by a raft of 
environmental conditions such as soil oxy- 
gen level, humidity, salinity, pH, temperature 
and biological organisms. 

Many industries, including oil, gas, 
marine and nuclear, collect corrosion data 
to identify risks, predict the service lives of 
components and control corrosion. Most of 
these data are proprietary, and best practices 
are rarely shared. Oil spills, bridge collapses 
and other disasters continue to occur. 

Demand for knowledge about corrosion is 
growing, with the increasing use of advanced 
materials in medical devices, biosensors, fuel 
cells, batteries, solar panels and microelec- 
tronics. Corrosion is the main restriction on 
many nanotechnology applications. 

Efforts to make materials data accessi- 
ble, such as the Materials Genome Initia- 
tive (MGI), focus on ‘births rather than > 
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> ‘deaths of materials. Online platforms 
for sharing corrosion data are badly needed. 
Access to a large volume and variety of cor- 
rosion information that researchers could 
probe with data mining and modelling tools 
would improve forecasts of corrosion fail- 
ures and anticorrosion designs. 


COMPLEX PROCESSES 

The biggest challenge in corrosion research 
is predicting accurately how materials will 
degrade ina given environment’. It requires 
full knowledge of all relevant factors and 
their interactions. Yet precise models for 
mechanisms are lacking. Forecasting prob- 
lems is impossible without historical data 
about materials failures under various con- 
ditions. And field performances cannot be 
judged in laboratories when environmental 
parameters are unknown. 

Corrosion data are hard to collect. Damage 
may take years or decades to accumulate and 
any project tracks only a handful of contrib- 
uting factors. Data sets need to be combined. 
For example, early studies of marine corro- 
sion (occuring, for instance, on oil-drilling 
platforms) were unreliable because they con- 
sidered only physiochemical processes (those 
involving pH, dissolved oxygen and tempera- 
ture) and not the effects of organisms living in 
seawater. The inclusion of genomic data has 
now improved the models. 

Corrosion depends on local conditions. 
Steel structures that last for decades in dry 
parts of inland China fail within months in 
humid and salty coastal areas of southeast 
Asia. Protective polymer coatings that work 
for years at northern latitudes can degrade 
in weeks near the Equator, where heat and 
greater doses of ultraviolet radiation break 
chemical bonds more quickly. Inferring 
general corrosion knowledge — such as how 
particular steels are affected by humidity, salt 
or air pollution — requires combining stud- 
ies from many diverse environments. One 
worldwide survey of weathering steel, for 
example, reviewed exposure test results for 
up to 22 years from 108 sites in 22 countries’. 

With global trade increasing, the oil and 
gas, construction, car, electronics and other 
industries have called for corrosion data to 
be shared between countries to ensure the 
quality and safety of their products. Millions 
of cars worldwide have been recalled in the 
past few years owing to unforeseen corrosion 
problems arising in destination countries. 
China's 2013 ‘Belt and Road initiative, which 
promotes industrial ties with countries along 
the Silk Road economic belt between China 
and the West, raises unprecedented chal- 
lenges. Rapid corrosion assessment, materials 
selection and design will be needed as billion- 
dollar construction, transport, energy and 
telecommunications projects begin in Asia, 
Africa and Europe. 

Advanced materials present entirely 


new corrosion problems. For example, the 
electrochemical stabilities of noble metals 
such as platinum and gold fall sharply as 
their dimensions decrease to nanometre 
scales. Corrosion of platinum nanoparticles 
remains a roadblock limiting the lifetime of 
platinum-based catalysts for fuel cells. 

Corrosion scientists have been slower 
than their materials-science peers to recog- 
nize the need for data sharing. Several large 
materials-data repositories built by US gov- 
ernmental agencies 


under the auspices of “Fe orecasting 
the MGI house basic problems 1s 
physical, chemical impossible 

and microstructure without 

data for materials, but historical data 
not corrosion data. Yet about material 


none ofthe advanced failures.” 
materials promised by 

the MGI will be practical without considering 
their environmental stability and durability. 


DATA REPOSITORIES 

Open data infrastructures should be set up 
to house corrosion data in various coun- 
tries, industries and applications. By using 
the same standardized formats for data and 
metadata, the data can be connected and 
eventually amount to a global system, pos- 
sibly linked to the MGI. 

Governments should take the lead. 
For example, the Chinese government 
has invested nearly 200 million yuan 
(US$30 million) since 2006 on a platform for 
sharing corrosion data from 30 field-testing 
stations covering standard materials in 
environments (air, soil and water) typical of 
different parts of the country. Other nations, 
industries and interest groups should estab- 
lish similar data infrastructures for corro- 
sion in other regions and sectors. 

Efforts need to be coordinated to collect 
corrosion data that are relevant to urgent 
or emerging challenges, such as alternative 
energy and nanotechnology. For instance, 
the US Department of Energy has partnered 
with the MGI to build materials data reposi- 
tories to help to speed up the development of 
alternative clean-energy sources. 

Funding agencies should incentivize the 
sharing of corrosion data about advanced 
materials and emerging technologies, 
for example, by demanding it in research 
grants and supporting the costs of publish- 
ing in open-access journals. Corrosion- 
science societies should learn from general 
materials-science societies (such as Materi- 
als Research Society, the Minerals, Metals & 
Materials Society and ASM International) 
and convene experts to establish data- 
sharing best practices and guidelines. 

Industry involvement can be encour- 
aged through partnerships with academia. 
Companies would save research and 
development costs in return for contributing 
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data to repositories. Because corrosion 
concerns maintenance and safety rather than 
industrial competition, businesses should 
be willing to share such data. Data consortia 
can be formed to identify common topics of 
priority and jointly develop benchmark solu- 
tions, just as industrial standards are agreed. 

More-powerful tools need to be developed 
for data capturing, management, mining, 
modelling and simulation — the integra- 
tion of which we term corrosion big data 
and informatics*. Advanced monitoring 
technologies require ‘big data’ analytics. For 
instance, robots (known as ‘smart pigs’) car- 
rying hundreds of sensors deployed to inspect 
the walls of pipelines can collect 1 terabyte of 
data in one run. Highly accurate corrosion 
simulations could partially or completely 
replace the time-consuming, environmen- 
tally unfriendly, complicated and expensive 
experimental corrosion tests. For example, 
quantum chemical simulations are heavily 
used to evaluate the molecular structures and 
electronic properties of corrosion inhibitors’. 

If corrosion data is shared, everyone will 
benefit from the greater understanding that 
results. m 
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CORRECTION 

The Comment article ‘Einstein was no 
lone genius’ (M. Janssen and J. Renn 
Nature 527, 298-300; 2015) wrongly 
stated the dates during which Albert 
Einstein studied at the Swiss Federal 
Polytechnical School in Zurich. He was 
there between 1896 and 1900. 


SUSTAINABILITY 


The first iconic image 
of Earth from space 

sparked awareness of 
planetary boundaries. 


The launch of 
Spaceship Earth 


Adam Rome revisits five prescient classics that first made 
sustainability a public issue in the 1960s and 1970s. 


Operating Manual for Spaceship Earth, 

the inventor and polymath Buckminster 
Fuller offered a striking metaphor for a new 
ideal of planetary management. Although 
Earth did not come with instructions, our 
spaceship had built-in safety features that 
had kept us going. Still, our pilot errors were 
catching up with us: we had been so “misus- 
ing, abusing, and polluting” the planet, Fuller 
argued, that it might need to be renamed 
“Poluto”. That way lay humanity’s oblivion. 
But if we discovered how our spaceship 
worked — if we learned to make the best 
use of our incredible ingenuity — we might 
become “comprehensively and sustainably 
successful”. 

Like everything Fuller wrote, Operating 
Manual for Spaceship Earth was idiosyn- 
cratic, at once arresting and fanciful. But 
many of the book’s basic ideas were in the 
air at the time. Between roughly 1965 and 
1975, the challenge of sustaining civiliza- 
tion inspired a shelf-full of influential books. 
They had a freshness, urgency and breadth 
that are hard to credit today, and they are still 
remarkably relevant. Now that sustainability 
as a concept has become dulled by overuse, 


IE 1969, in a book-length essay entitled 


they return our eyes to the prize. 

These seminal studies built on earlier 
fears. Fairfield Osborn’s Our Plundered 
Planet (Little, Brown) and William Vogt’s 
Road to Survival (W. Sloane Associates), 
both published in 1948, warned that uncon- 
trolled population growth and resource 
depletion would lead to calamity. But the 
situation seemed even more precarious 
by 1970, when the first Earth Day was cel- 
ebrated across the United States. The human 
impact on the planet had exploded after the 
Second World War, and scientific advances 
had led to greater understanding of the 
threat from those impacts. For the first time, 
many realized, we had the potential to dis- 
rupt or even destroy the planet’s life-support 
systems. The sense of environmental crisis 
was exacerbated by the social and political 
turmoil of the period. 

What would be required for humanity 
to continue to thrive? To tackle so huge a 
question required intellectual audacity, and 
the authors of the pioneering books on sus- 
tainability were all big-picture, interdisci- 
plinary thinkers par excellence. Economist 
Kenneth Boulding — author of The Meaning 
of the Twentieth Century (1964) — thought 
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The Meaning of the Twentieth Century: The 
Great Transition 

KENNETH E. BOULDING 

Harper and Row: 1964. 


Operating Manual For Spaceship Earth 
R. BUCKMINSTER FULLER 
Southern Illinois University Press: 1969. 


The Closing Circle: Nature, Man, and 
Technology 

BARRY COMMONER 

Knopf: 1971. 


The Limits to Growth: A Report for the Club 
of Rome’s Project on the Predicament of 
Mankind 

DONELLA H. MEADOWS, DENNIS L. MEADOWS, 
J@RGEN RANDERS, AND WILLIAM W. BEHRENS III 
Universe: 1972. 


Only One Earth: The Care and Maintenance 
of a Small Planet 

BARBARA WARD AND RENE DUBOS 

W. W. Norton: 1972. 


historically and philosophically. Biologist 
Barry Commoner felt compelled to study 
political economy, as his 1971 The Closing 
Circle shows. Fuller considered himself a 
futurist. The authors of the 1972 The Limits 
to Growth — Donella Meadows, Dennis 
Meadows, Jorgen Randers and William 
Behrens — meshed environmental science 
with systems analysis. Barbara Ward was a 
journalist, economist and adviser to world 
leaders who collaborated with Pulitzer-prize- 
winning microbiologist René Dubos on Only 
One Earth (1972). 

The Meaning of the Twentieth Century 
is no longer well known, yet Boulding was 
key in framing the issue of sustainability. He 
made clear that the world that he hoped to 
sustain did not yet exist: humanity was in the 
middle of a “great transition” from an agri- 
cultural species to a thoroughly industrial 
one. In Boulding’s view, this transition was 
fraught with peril and sure to be wrenching. 
It might be derailed by nuclear war or uncon- 
trolled population growth, and might fail if 
we misused natural resources, especially fos- 
sil fuels. To succeed, we needed to create “a 
stable, closed-cycle, high-level technology” 
that would not pollute or require exhaust- 
ible materials. (He expanded on that in an 
often-reprinted 1966 essay, “The economics 
of the coming spaceship Earth’) But devel- 
oping new technology was not the heart of 
Boulding’s prescription. He argued that a 
sustainable future would require countless 
“social inventions’, from new aesthetics to 
better methods of resolving disputes. “The 
unfinished tasks of the great transition are 
so enormous,’ he concluded, “that there is 
hardly anyone who cannot finda roleto > 
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NASA 


BOOKS & ARTS 


Inventor Buckminster Fuller (top) approached sustainability as a design challenge; economist Barbara 
Ward (bottom) prompted the United Nations to integrate social and environmental issues. 


play in the process.” That is still ever true 
now: dealing with climate change requires a 
host of skills. 

Fuller broke new ground by defining sus- 
tainability as a design challenge. Already 
famous for inventions such as the strong, 
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lightweight, geodesic dome, he wrote exuber- 
antly about the need for an “industrial retool- 
ing revolution”: to achieve lasting affluence, 
we must learn to do more with less. Like Boul- 
ding, Fuller argued that we needed to treat 
fossil fuels as a short-term expedient while 
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we worked out how to fashion a sustainable 
future. For a reader today, the insights of 
Fuller’s work are not enough to make up for 
the idiosyncrasies of his language and argu- 
ment. William McDonough and Michael 
Braungart’s Cradle to Cradle (North Point, 
2002) would be a much better introduction to 
sustainable design. But in 1969, Fuller’s work 
seemed thrilling, and his Operating Manual 
became a bible for people keen to invent eco- 
efficient ways of providing energy, building 
things and managing wastes. 

Commoner’s The Closing Circle laid the 
foundation for industrial ecology. Particularly 
in the postwar decades, Commoner argued, 
the industrialized world had come to rely on 
a host of “ecologically faulty” technologies, 
from nuclear power to chemical pesticides. 
The technologies of the future needed instead 
to accord with four basic principles, which 
he defined as laws of ecology: “Everything 
is connected to everything else’, “Everything 
must go somewhere’, “Nature knows best” 
and “There is no such thing as a free lunch”. 

For Commoner, however, the ultimate 
problem was economic and political, not 
technological. Discussing the economic 
meaning of ecology, he argued that the 
private-enterprise system had serious 
flaws. Businesses had powerful incentives 
to produce new products that did more 
environmental harm than the products 
they replaced. They did not need to account 
for “biological capital’, and they did not pay 
the full costs of production, which included 
pollution. In the decades since The Closing 
Circle appeared, making capitalism greener 
has become a major concern of economists, 
business-school professors, entrepreneurs, 
corporate executives and activists, yet much 
of Commoner’s critique still holds. 

The Limits to Growth asked — heretically 
— whether humans could continue indefi- 
nitely to make ever greater demands on 
Earth. The authors used computer modelling 
to explore the interactions between popula- 
tion growth, resource demand, industriali- 
zation, food production and pollution. They 
did not forecast the future, although com- 
mentators ever since have debated whether 
their ‘predictions’ were right; instead, they 
extrapolated. If present trends continued, 
the authors wrote, humanity would hit the 
wall “sometime within the next hundred 
years”. They hoped that people would avert 
a breakdown, but stated repeatedly that they 
could not model the social, political and cul- 
tural factors that might alter trends. They 
did consider whether technology could be a 
magic bullet, and the results were shocking. 
Even when they allowed for the technological 
progress that greatly increased the availability 
of resources and reduced the amount of pol- 
lution, the result was still collapse — just far- 
ther down the road. Innovation alone could 
not lead toa sustainable economy. We needed 
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a fundamental shift in values. 

The Limits to Growth was an international 
sensation, selling over 12 million copies in 
more than 30 languages. Meadows, Meadows 
and Randers updated the analysis in 1993 and 
again in 2004, and the question of limits still 
prompts vigorous debate. Johan Rockstrém 
and Mattias Klum's Big World, Small Planet 
(Yale University Press, 2015) and Donald 
Worster’s Shrinking the Earth (Oxford Uni- 
versity Press, 2016) are just two of the many 
books now probing the problem of growth. 

Ward and Dubos’s Only One Earth, writ- 
ten to accompany the 1972 United Nations 
Conference on the Human Environment, 
added an international perspective to the 
sustainability discussion. Ward had travelled 
the globe as an expert on economic develop- 
ment. A preliminary draft of the book was 
circulated for comment to scientific, busi- 
ness and intellectual leaders from 58 coun- 
tries, and the result is worth reading just for 
the summary of their responses, which made 
clear that people around the world held very 
different views about environmental issues. 
A European respondent argued for a retreat 
from industrialization, for example, whereas 
an Asian statesman wrote that developing 
nations could not afford “dreams of land- 
scapes innocent of chimney stacks”. 

For Ward and Dubos, any effort to ensure 
the survival of humanity had to bridge the 
tremendous gap between developed and 
developing nations. Although they didn’t 
use the phrase ‘sustainable development; 
they offered a path-breaking analysis of the 
challenge of raising living standards for the 
poor without degrading the environment. 
At the same time, they called for the affluent 
to take off their blinkers. Well-to-do nations 
needed to acknowledge the damage that they 
were doing to the biosphere — and to accept 
that their fate was inseparable from the pros- 
pects of the rest of the world. Because many 
environmental threats were global, Ward and 
Dubos concluded, “planetary interdepend- 
ence” had to become a moral and political 
reality, not just “a hard and inescapable sci- 
entific fact”. The UN Paris Climate Change 
Conference starting this month will be a test 
of how close we are to meeting that aim. 

Read together, the books of this charged 
decade demonstrate that building a sus- 
tainable civilization is multidimensional. It 
sweeps everything in: science and technol- 
ogy, politics, economics, social relationships, 
ethics. We cannot advance ina straight line. 
We need to approach the goal from many 
directions, with flexibility and tenacity. = 
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Books in brief 


aS The Secret of Our Success: How Culture Is Driving Human 
| OTHE Evolution, Domesticating Our Species, and Making Us Smarter 
te Joseph Henrich PRINCETON UNIVERSITY PRESS (2015) 

S SECRET The force propelling Homo sapiens down its unique evolutionary 

tom, pathway is “culture-gene coevolution”, avers anthropologist (and 

= OF QUR | aerospace engineer) Joseph Henrich. Over time, he posits, the need 

| ae | to acquire “adaptive cultural information” expanded the human 

i SUCCESS brain, and societies’ “collective brains” in turn shaped human 

eS culture. Integrating insights from cognitive psychology, experimental 
J economics, history and ethnography, this limber and lucid study 


concludes that we face a major transition into a new type of animal. 


The Last of the Light: About Twilight 

Peter Davidson REAKTION (2015) 

Cultural historian Peter Davidson enters the twilight zone, tracing 
the crepuscular in science, psychology, history and the arts. 
Considering the 60th parallel north, around which “long evenings 
and protracted sunsets stretch”, Davidson probes aspects of this 
transitional state, including visual perception during the stages of 
twilight (civil, nautical and astronomical); dusk as a metaphor for 
crisis in Charles Dickens’s Bleak House; the proliferation of gilt and 
mirrors in the murky pre-electric era; and the poet Gerard Manley 
Hopkins’ observations of anti-crepuscular rays, published in Nature. 


London Fog: The Biography 

Christine L. Corton BELKNAP (2015) 

London’s ‘pea-soupers’ — opaque, yellowish smogs — were an 
environmental catastrophe, a cloak for nefarious activities and an 
artistic inspiration. An odiferous wig of soot from coal fires, sulfur 
dioxide and mist settled regularly over the city from the 1840s to the 
1960s. In this richly nuanced history, scholar Christine Corton takes 
us from polymath Robert Hooke spotting a pall of smoke over London 
in 1676 through the killer fogs that felled zoo animals, spurred crime 
and caused traffic accidents, and that ultimately galvanized scientists 
and the government to craft the 1956 Clean Air Act. 


The Secrets of Sand: A Journey into the Amazing Microscopic 
World of Sand 

Gary Greenberg, Carol Kiely and Kate Clover VOYAGEUR (2015) 
Beachcombers take heed: the real treasure is stuck to your soles. 
Sand — as cell biologist Gary Greenberg, microscopist Carol Kiely 
and science curator Kate Clover show in this delightful coffee-table 
book — is dazzling, from star-shaped forams to egg-like ooids. To 
photograph these minuscule jewels rock-polished by wind and surf, 
Greenberg used 3D microscopes and smart lighting. A stunning 
extra are images of the lunar dust particles that Kiely studies, 
including glassy spherules from extinct fire-fountain volcanoes. 


The Best American Infographics 2015 

Gareth Cook and Maria Popova MARINER (2015) 

Another year, another superb volume in this infographics series 
edited by journalist Gareth Cook; cultural curator Maria Popova (of 
blog ‘Brain Pickings’) guest-introduces. ‘What Do Americans Speak?’ 
(Slate, 13 May 2014) offers an eye-popping map showing the third 
most commonly spoken language in each US state — in Michigan, 
that is Arabic — and Nature’s own ‘Born Here, Died There’ (Nature 
http://doi.org/8xg; 2014) explores dynamic patterns in cultural 
history through an elegant animation. Barbara Kiser 
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Gene editing: heed 
disability views 


CRISPR-Cas9 is a gene- 
editing tool of great potential, 
although not necessarily from 
a disability-rights perspective 
(see D. J. H. Mathews et al. 
Nature 527, 159-161; 2015). 
People with disabilities are, in 
my view, unlikely to be queuing 
up for genetic modification: 
their priority is to combat 
discrimination and prejudice. 

To ‘fix’ a genetic variation that 
causes a rare disease may seem 
an obvious act of beneficence. 
But such intervention assumes 
that there is robust consensus 
about the boundaries between 
normal variation and disability. 
Contrary to the prevailing 
assumption, most people with 
disabilities report a quality of life 
that is equivalent to that of non- 
disabled people (G. L. Albrecht 
and P. J. Devlieger Soc. Sci. Med. 
48, 977-988; 1999). 

The UK Nuffield Council 
on Bioethics is deliberating the 
ethical and social dimensions 
of CRISPR. International 
guidelines are urgently needed 
(Nature 526, 310-311; 2015), 
and the voices of people living 
with illness and impairment 
need to be heard. 
Tom Shakespeare University of 
East Anglia, Norwich, UK. 
tom.shakespeare@uea.ac.uk 


Gene editing: govern 
ability expectations 


From a disability-rights 
viewpoint, problems that have 
dogged the debate on human 
genetic modification (see 
go.nature.com/6wb45k) also 
pervade your curtain-raiser 

to the US National Academies 
of Sciences, Engineering and 
Medicine conference (see 
D.J.H. Mathews et al. Nature 
527, 159-161; 2015). The 
authors’ portrayal of the public 
as a passive recipient of ‘wisdon’ 
from ‘experts’ goes against 
healthy discourse on responsible 


research and governance. 

The disability-rights 
community has a history 
of disagreement with such 
experts (including authorities, 
scientists and clinicians) over 
their perception of people with 
disabilities. This is summarized 
as ‘ableism’, a view that disability 
is an abnormality instead ofa 
feature of human diversity. It 
can lead to flawed ‘solutions’ and 
disempower those affected (see 
G. Wolbring J. Crit. Anim. Stud. 
12, 118-141; 2014). 

“Tt is time to collectively make 
decisions about the kind of 
world we want to live in,’ write 
Mathews and colleagues. This 
discussion should include ability 
expectations and how they 
should be governed. 

Gregor Wolbring University of 
Calgary, Alberta, Canada. 
gwolbrin@ucalgary.ca 


Gene editing: survey 
invites opinions 


As the US National Academies 
of Sciences, Engineering and 
Medicine summit on the 
regulation of CRISPR-Cas9 
gene-editing tools gets under 
way, we invite readers to 
contribute their opinions about 
this technology and its use to a 
survey at go.nature.com/eyowaf. 
Public engagement in decisions 
about applications of science 
and technology that affect 
society is essential. The summit 
is, to a degree, modelled on 
the 1975 Asilomar Conference 
on the potential biohazards of 
recombinant DNA (see Nature 
http://doi.org/899; 2015). It must 
not make the same mistake of 
being held behind closed doors. 
As one survey contributor 
remarks, it may be impossible “to 
get this [CRISPR-Cas9] genie 
back into the bottle”. So when it 
comes to wishes for the genie, 
those of both scientists and the 
public must be considered. 
Silvia Camporesi, Lara Marks 
King’s College London, UK. 
silvia. 1.camporesi@kcl.ac.uk 
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Climate change also 
creates expatriates 


I visited the island of Tuvalu in 
the Pacific Ocean three decades 
ago as the environmental 
assessor for an aid-funded 
engineering consultancy. 
Pollution of the freshwater lens 
and scavenging of protective 
shoreline coral rubble for 
construction were problems even 
then. As you note (see Nature 
526, 624-627; 2015), these may 
drive exodus sooner than rising 
sea levels. 

Nobody likes to be forced out 
of their home. But small oceanic 
nations hold a valuable asset: 
sovereignty. Tuvalu already 
profits from its own Internet 
domain (.tv), and sovereign 
nations have United Nations 
votes, which are effectively on 
the market. They can operate 
attractive tax regimes. They can 
declare marine reserves and 
sell rights to fisheries, seabed 
mining or reef tourism. All of 
these make money, and it does 
not have to be divided between 
many people. They can all be 
done even if nobody lives there 
in person. Citizens of such 
small island nations could thus 
become well-off expatriates, as 
well as refugees. 

Ralf Buckley Griffith University, 
Gold Coast, Australia. 
r.buckley@griffith.edu.au 


Crowdfunding not 
fit for clinical trials 


Crowdfunding can raise money 
quickly and with minimal 
bureaucracy. But it should not be 
considered as a way to finance 
clinical trials because of potential 
ethical implications. 

One problem is that funding 
recipients are not accountable to 
the public because crowdfunding 
is unregulated. Another is that 
there is no setting of research 
priorities, so crowdfunded 
clinical trials may not be the 
most important or widely 
applicable ones. And media 


tactics could attract emotional 
donations, for example by 
generating false expectations of a 
‘cure. Moreover, an inconclusive 
or negative outcome could erode 
public trust. 

By contrast, the mainstream 
funding process for clinical 
trials takes into account disease 
prevalence, morbidity and 
mortality, justice and utility. 
Crowdfunding for clinical trials 
should be similarly regulated to 
mitigate its potential risks. 
Phaik Yeong Cheah University 
of Oxford, UK. 
phaikyeong@tropmedres.ac 


Lessons from EPA on 
tracking pollutants 


In our opinion, China could 
learn from the success of the 

US Environmental Protection 
Agency (EPA) in providing 
open-access environmental 
information to the public. This 
would enhance the credibility of 
government decisions. 

The EPA’ Toxics Release 
Inventory programme, in 
partnership with state agencies, 
collects data from enterprises 
that must report emissions. It 
subjects this information to 
quality-assurance reviews, trends 
analysis and error correction, 
as well as making it publicly 
available. This evaluation of the 
entire information-flow process 
increases transparency and 
accountability. 

Using a comparable holistic 
approach, China’s Ministry of 
Environmental Protection could 
develop a secure access point 
for ministries and agencies and 
a web portal for public access. 

A designated group might set 
quality standards and policies 
for handling such information, 
akin to the EPA’s Office of 
Environmental Information. 
Bo Zhang Information Center, 
Ministry of Environmental 
Protection, Beijing, China. 
Wayne S. Davis EPA, 
Washington DC, USA. 
zhangbo@mep.gov.cn 
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For News & Views online, go to 
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Acorn worms in a nutshell 


The genome sequences of two members of the hemichordate group of marine invertebrates bring the evolution of their 
relatives, including vertebrates, into sharper focus. SEE ARTICLE P.459 


CASEY W. DUNN 


y examining the similarities and 
B differences among the genomes of 

living organisms, we can reconstruct 
features of the genomes of long-dead ances- 
tors. Such reconstructions provide insight 
into patterns of genome diversity and how 
organisms evolved through the gain, loss and 
modification of genomic features. The greater 
the number of sequenced genomes from living 
organisms, and the broader their distribution 
across the tree of life, the better is our view of 
these ancestral genomes. However, although 
hundreds of animal genomes have been pub- 
lished in recent decades, the vast majority are 
from only two groups: vertebrates and arthro- 
pods. Simakov and colleagues’ publication’ 
in this issue (page 459) of genome sequences 
for two species from a group of invertebrates 
known as hemichordates takes the sampling 
of animal genomes an important step forward. 

Hemichordates are exclusively marine 
animals. The adults live on the ocean bottom, 
whereas the larvae are free-swimming. There 
are about 130 described species’, which are 
divided into 2 groups. The pterobranchs, of 
which there are around 20 species, are small 
animals (up to about 5 millimetres long) that 
form colonies of asexually produced clones 
attached to a central disk by fleshy tethers’. 
The animals live in a tube network that they 
secrete. Just as birds were found to be ‘living 
dinosaurs’ — a group that had been thought 
extinct — pterobranchs are living graptolites, 
animals that are abundant in the fossil record*. 
In contrast to pterobranchs, the other group of 
hemichordates, called enteropneusts, are soli- 
tary animals that range in length from less than 
a millimetre’ to more than 2 metres (Fig. 1). 
Known as acorn worms, enteropneust adults 
burrow in soft sediments. 

Simakov and colleagues present the genome 
sequences of two enteropneusts — Saccoglossus 
kowalevskii and Ptychodera flava. The authors 
used these sequences, together with additional 
DNA sequence data on pterobranchs and sev- 
eral other animals, to build a phylogenetic tree 
that finds pterobranchs and enteropneusts to 
be sister groups (Fig. 2). This finding is in 
agreement with another analysis’ in reject- 
ing the previously suggested placement of 


Gill pores 


Figure 1 | Pharyngeal gill slits. Enteropneusts, better known as acorn worms, use internal gill slits in the 
pharynx region of their trunk to move water through their mouth to obtain oxygen and, in some species, 
for filter feeding. The gill slits connect to external gill pores. This specimen is several centimetres long. 


pterobranchs within enteropneusts. 

As interesting as hemichordates are in 
their own right’, much of the motivation for 
taking a closer look at them comes from a 
desire to understand their relatives. This is 
because hemichordates fall within Deutero- 
stomia, the group of animals that also includes 
echinoderms (radially symmetrical organ- 
isms such as sea stars and sea urchins) and 
chordates. Chordates are of particular interest 
because they include humans and our verte- 
brate kin. Although many chordate genome 
sequences are available, there are few genome 
resources for other deuterostomes. A draft 
genome for a sea urchin is available’, but until 
now there were no published genomes for 
hemichordates. 

The most recent common ancestor of 
deuterostomes lived more than 500 mil- 
lion years ago, and there is great diversity in 
the anatomy of adults of this group. However, 
many features of deuterostome embryology, 
including the formation of the anus from 
the blastopore and the creation of coelomic 
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cavities by pinching off from the gut, are highly 
evolutionarily conserved. The main finding of 
Simakov and colleagues’ study is that deuter- 
ostome genomes, like their embryology, show 
extensive conservation across great evolution- 
ary timescales. The hemichordate sequences 
share many features with other deuterostome 
genomes, including gene composition, exon- 
intron structure and small- and large-scale 
gene order. This means that many well-char- 
acterized features of chordate genomes are not 
chordate-specific, but arose earlier in animal 
evolution. 

One of the most conspicuous deuterostome- 
specific traits is the pharyngeal gill slits. These 
openings allow water to pass through the 
mouth without entering the digestive tract, 
and they are involved in feeding and respira- 
tion in these animals. Gill slits arose in the stem 
lineage that gave rise to deuterostomes, and are 
not found in non-deuterostome animals, nor 
in echinoderms, in which they were secondar- 
ily lost (Fig. 2). A detailed understanding of 
the evolutionary origin of this feature is key 


CASEY W. DUNN 
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Figure 2 | Deuterostome relationships. The deuterostome group can be divided into three clades: 
chordates (Cephalochordata, or lancelets; Craniata, which includes all vertebrates; and Urochordata, such 
as sea squirts); echinoderms (sea stars, sea urchins and relatives); and hemichordates (pterobranchs and 
enteropneusts). Simakov et al.' present the first hemichordate genome sequences, from two enteropneust 
species. The authors’ analyses provide new detail on evolutionarily conserved genes that play a part in the 
development of gill slits. These structures arose along the deuterostome stem, were lost in echinoderms 
and are reduced in the adults of some chordates (including humans). 


to understanding deuterostome, and there- 
fore our own, biology. Simakov et al. examine 
a conserved cluster of six genes that is found 
only in deuterostomes, and that includes genes 
known to be involved in patterning gill slits in 
other deuterostome species. In keeping with 
the other conserved genome features that they 
identified, the authors find that these genes are 
also expressed in the pharyngeal-gill structure 
of hemichordates. 

Simakov and colleagues recognize that there 
is no ‘typical’ representative of any animal 
group; they sequenced two full hemichordate 
genomes, and collected less-detailed sequence 
data from a variety of other species to put these 
genomes in a richer evolutionary context. 
However, the authors’ study still faces the same 
challenge as all genome investigations — we 


CIRCADIAN CLOCKS 


are far from understanding which evolutionary 
changes in genomes underlie which evolution- 
ary changes in traits, including development, 
anatomy and functional biology’. 

There are several reasons for this. First, 
many evolutionary genome changes are 
neutral’ — they have no impact on traits or 
fitness. This means that we should not assume 
that any particular genome change affects 
any traits. Second, on any given phylogenetic 
branch there will be many changes in both 
traits and genomes, and there are many pos- 
sible functional implications for any particu- 
lar genome change. Third, genome function 
itself evolves, so the same genome features 
do not necessarily relate to the same traits in 
different species. 

Our current coarse perspective on genome 


A receptor for subtle 
temperature changes 


The protein IR25a is best known for its role as an odour receptor in flies, but an 
analysis reveals that it also acts to synchronize the circadian clock by sensing 
small temperature fluctuations. SEE LETTER P.516 


FRANCOIS ROUYER 
& ABHISHEK CHATTERJEE 


ur body’s circadian clocks sense 
the environmental changes that 
occur over 24 hours, allowing us 
to adapt our physiology and behaviour to 
day-night cycles. Light and temperature have 
by far the greatest influence on the clock that 
drives rest-activity rhythms, but how the 
neurons of this clock synchronize to tempera- 
ture in the brain remains largely unknown. 


On page 516 of this issue, Chen et al.’ identify 
a receptor protein in mechanosensory organs 
in flies that acts as a specialized temperature 
sensor, synchronizing the circadian clock with 
low-amplitude temperature cycles. 

Small daily fluctuations of just 1-2 °C are 
enough to synchronize the fly brain’s circadian 
clock with temperature’ (a process known as 
temperature entrainment). Experiments 
using cultures of different fly body parts have 
revealed that most organs can entrain their 
clocks with temperature cycles. The exception 
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evolution will improve as more genomes are 
sequenced, and as functional genomic tools are 
developed that can be applied to any organ- 
ism, not just those that can be grown in the 
laboratory. The conservation of so many fea- 
tures across deuterostome genomes, which is 
brought into sharp focus with Simakov and 
colleagues’ addition of hemichordate genome 
sequences, reinforces the fact that radical 
morphological changes are not necessarily 
related to radical changes in genomes. This 
fact will shape the search for which of the vari- 
able features of deuterostome genomes are 
responsible for the great diversity we see across 
the group. = 


Casey W. Dunn is in the Department of 
Ecology and Evolutionary Biology, 
Brown University, Providence, 

Rhode Island 02912, USA. 

e-mail: casey_dunn@brown.edu 
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is the brain, which must thus rely on external 
sensors’. Expression of the nocte gene is 
required both for the normal development of 
mechanosensory structures called chordotonal 
organs and for temperature entrainment of the 
brain clock’. This suggests that chordotonal 
organs, which are present in the antennae and 
body parts such as legs and wing hinges, are 
the external sensors. Although the antennae 
are the major temperature-sensing organ*”, 
they are not essential for entrainment, indi- 
cating that the body chordotonal organs can 
do the job. 

To analyse the role of the body chordotonal 
organs in temperature entrainment of the 
brain's rest-activity clock, Chen et al. looked 
for proteins that interact with the Nocte pro- 
tein. Among the putative Nocte-binding 
partners that they identified was the protein 
Ionotropic Receptor 25a (IR25a). IR25a is part 
of the IR family, members of which are found 
in sensory organs and are involved in detecting 
chemicals°. So far, IR25a has been best known 
for its role as a component of a multimeric 
odour receptor in antennae’. As expected for 
a Nocte partner, the authors found that IR25a 
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was present in the sensory neurons of 


Wild type No IR25a 


One candidate is the ion-channel 


chordotonal organs. 

Chen and colleagues investigated 
whether flies lacking IR25a could syn- 
chronize their rest-activity rhythms 
to temperature changes using several 
different entrainment protocols. For 


a 


Dark 


Activity 


Light 


protein Pyrexia, which is found in 
the chordotonal organs and seems 
to be needed for entrainment in low- 
temperature conditions’’. Another 
channel, TrpA1, has been implicated 
in the control of the fly clock by tem- 


example, they entrained flies using 


perature". However, a consensus is 


regular 12-hour intervals of light and b 
dark, and then kept the entrained ani- 
mals in the dark while applying anew 
regime of large-amplitude tempera- 
ture cycles (between 16 and 25°C or 
20 and 29°C), 7 hours ahead of the old 


: 


Temperature 


emerging for a role for TrpA1 in the 
temperature-dependent regulation 
of afternoon siestas, rather than in 
entrainment’? ™*. 

The neuronal circuits that pass 
temperature information from the 


light-dark cycle. Both wild-type flies c 
and IR25a mutants quickly reset their 
clock, becoming most active at the 
end of the warm phase, whereas nocte 
mutants did not. However, when the 
new regime involved low-amplitude 


chordotonal organs to the brain 
also remain unknown. The complex 
changes in protein oscillations caused 
by loss of IR25a point to a role for 
several groups of neurons. But 
whether different temperatures or 


temperature fluctuations (18-20°C, 


amplitudes of change target specific 


21-23°C and 25-27 °C), the IR25a 
mutants failed to adapt (Fig. 1). IR25a 
is thus required for synchronizing the 
circadian system with low-amplitude 
temperature changes, apparently 


D1 


AA B.. 


subsets of clock neuron is an open 
question. In addition, Chen and 
colleagues’ work suggests that the 
clock network responds differently 
to temperature in constant light 


independently of the temperature 
range. The authors also provide data 
to show that IR25a acts in the body 
chordotonal organs, rather than the 
antennae. 

If IR25a acts as a temperature 
sensor in sensory neurons upstream 
of the rest-activity clock, its loss 
should prevent synchronization 
with temperature not only of behav- 
iour, but also of the oscillations 
of the proteins that make up the 
clock itself. The rest-activity clock 
involves protein oscillations in about 
150 neurons, from around 6 subsets. 
Chen et al. observed complex defects 
in clock-protein oscillations during shallow 
temperature cycles in IR25a mutants, whereas 
oscillations were normal during light-dark 
cycles. 

Most clock-neuron subsets were affected, 
but the authors observed intriguing differences 
when the temperature cycles were applied in 
constant light or darkness. Notably, some of the 
proteins in some groups, such as ventral lateral 
neurons (LNvs), stopped cycling in darkness, 
whereas others, such as DN2 dorsal neurons, 
stopped in light. How temperature informa- 
tion travels in the clock network remains to 
be discovered, but the authors revealed a key 
role for DN groups — blocking their neuronal 
output with tetanus toxin prevented behav- 
ioural synchronization to shallow temperature 
cycles. Interestingly, this finding sits well with 
previous observations that DN2 neurons are 
involved in the temperature entrainment of 
the larval clock*®, and temperature preference 
rhythms in the adult’. 

Finally, Chen et al. demonstrated the role 
of IR25a in neuronal responses to small 


Time Time 


Figure 1 | Telling time with temperature. Chen et al.’ investigated 
whether flies lacking the protein IR25a can respond to temperature 
changes to reset the circadian clock in their brain that controls 
rest-activity rhythms. a, The authors synchronized the flies’ clock to 
12-hour light-dark cycles, in which activity peaked twice during light 
hours. The flies were then kept in constant darkness and exposed to 
different temperature cycles. b, At constant temperature, the insects’ 
activity changed, but was still synchronized to the original light-dark 
cycle. c, When exposed to large temperature cycles (variations of 9 °C) 
that fluctuated 7 hours ahead of the light-dark cycle, the clock was 
reset in both types of fly, so that the activity peak occurred at the end 
of the warm period. d, When exposed to low-amplitude fluctuations 
(2°C), flies lacking IR25a could not respond, demonstrating that IR25a 
mediates the clock’s synchronization with small temperature changes. 


temperature changes. Whereas sensory neu- 
rons in the legs of both wild-type and IR25a- 
mutant flies responded to movements of the 
leg, only wild-type neurons were activated by 
temperature changes. Moreover, when IR25a 
was misexpressed in a population of large 
LNv clock neurons, their activity increased in 
response to small temperature fluctuations. 
Thus, the presence of IR25A is sufficient to 
induce neuronal responses to temperature, 
even in the absence of its partners in the multi- 
meric olfactory receptor. 

Chen and colleagues’ study deftly 
demonstrates the temperature sensitivity of 
the rest-activity clock, and suggests that the 
chordotonal organs are key players in the 
clock’s temperature entrainment. However, 
the inability of nocte mutants to respond 
to large temperature cycles indicates that 
the chordotonal organs also mediate the 
response of circadian rhythms to larger tem- 
perature changes. Whether IR25a might play 
a part here is unclear, but the process clearly 
involves other temperature sensors as well. 
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or darkness, suggesting that light 
inputs strongly affect temperature 
entrainment. 

Although light can entrain the 
brain clock through both internal 
and external sensors, temperature 
entrainment seems to rely on external 
sensors. The benefit of preventing the 
brain clock from directly sensing tem- 
perature is unclear. In mammals, the 
rest-activity clock, which is located in 
a brain region called the suprachias- 
matic nuclei, also controls body-tem- 
perature rhythms and uses them to 
synchronize peripheral clocks'*’* — 
it therefore makes sense that these 
nuclei contain temperature-resistant neuronal 
networks. But flies do not regulate their body 
temperature, so why is their brain clock tem- 
perature-resistant? Perhaps this organization 
prevents any overly strong effects of tempera- 
ture in favour of light, whose daily oscillation 
might be more reliable. Perhaps it mediates 
a balance between light and temperature, 
allowing one entrainment circuit to influence 
the other. Deciphering how the two sensory 
modalities are integrated by the clock neuronal 
network will be an exciting challenge for the 
next few years. m 
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Super-resolution 


ultrasound 


By infusing blood vessels with gas-filled microbubbles and using rapid 
ultrasound imaging to detect the bubbles, super-resolution imaging of 
an entire vessel system has been achieved in a rat brain. SEE LETTER P.499 


BEN COX & PAUL BEARD 


Itrasound imaging is used in hospitals 

throughout the world as a safe, non- 

invasive and relatively inexpensive 
way to visualize a patient’s internal tissues in 
real time. The quality of ultrasound images has 
been improving steadily since the 1970s, owing 
to advances in hardware and image-forming 
algorithms. But, like all wave-based imaging 
techniques, ultrasound faces a fundamental 
limit because of the way that waves spread out 
(diffract) as they travel — two objects are dis- 
tinguishable from one another only if they are 
more than half a wavelength apart. On page 499 
of this issue, Errico et al.! overcome this limit to 
produce super-resolution images of the micro- 
vasculature in the brain ofa live rat. 

If the resolution limit of ultrasound imag- 
ing is related to wavelength, why not just use 
sound with a shorter wavelength? Although 
this approach is useful to some extent, the 
absorption of ultrasound waves increases 


Ultrasound scanner 


strongly as the wavelength decreases; therefore, 
using shorter wavelengths limits the depth to 
which tissue can be imaged before the reflected 
waves are attenuated too much to be detected. 
As a result, the resolution limit in clinical 
ultrasound imaging is at best hundreds of 
micrometres. To take useful images at depth, 
it is therefore necessary to bypass the half- 
wavelength limit. 

The same fundamental resolution limit is 
found in light microscopy. But the develop- 
ment of several super-resolution techniques, 
such as photoactivated localization micro- 
scopy (PALM), has enabled researchers to 
achieve nanoscale resolution — breakthroughs 
for which the 2014 Nobel Prize in Chemistry 
was awarded. 

PALM achieves super-resolution imaging 
in three steps. The first step is to image light- 
activated fluorescent molecules that act as tiny, 
randomly distributed pinpricks of light. The 
use of low light intensities and the fact that the 
molecules’ activation is inherently random 
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ensures that only a sparse subset is turned on 
at any one time. Thus, these point-like light 
sources are separated by more than half a 
wavelength, so the image of each one (a blurred 
spot called the point spread function) does not 
overlap with that of its neighbours. 

The second step is to determine the exact 
position of each point-like source by find- 
ing the centre of the point spread function. 
This is possible for well-separated sources, 
because the shape of the point spread func- 
tion can be known in advance. The final step 
is to repeat the illumination and detection 
steps many times. A different set of separated 
point-like sources is detected each time, until 
a sufficient density of source points has been 
obtained. By marking the positions of all of 
these point sources on a single meta-image, 
a super-resolved picture can be built up. The 
spatial resolution in this image can exceed the 
diffraction limit, because it is determined by 
the accuracy with which the position of each 
source can be estimated. 

Coulda similar approach be used to achieve 
super-resolution ultrasound imaging? The 
first challenge is to identify potential point 
sources (point scatterers in the case of ultra- 
sound). Because small blood vessels are poor at 
reflecting sound waves, they can be hard to see 
using ultrasound, and gas-filled microbubbles, 
which reflect sound well, have long been used 
as contrast agents to enhance vessel visibility. 
Microbubbles are strong scatterers of sound, 
and so are good candidates as point sources. 
But to be useful in this context, there must be 
some way to identify them separately in ultra- 
sound images. 
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Figure 1 | Vessels visualized. Errico et al.’ obtained ultrasound images of rat blood vessels infused with gas-filled microbubbles, which reflect ultrasound 
waves. By taking images rapidly (at 500 frames per second) and generating difference data to compare sequential images, they were able to pinpoint the locations 
of the few, well-separated microbubbles that degraded between each image (only a small number of microbubbles are shown here for simplicity, and are not 

to scale). By repeating this process over many frames, a composite image was built up that revealed the locations of many thousands of microbubbles, making 
super-resolution images possible in around 150 seconds. 
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In 2013, researchers achieved super- 
resolution ultrasound imaging by using a 
sufficiently dilute solution of microbubbles to 
achieve the necessary separation’. Earlier this 
year, the same group used this approach to 
obtain super-resolution images of the micro- 
vasculature in a mouse ear to a depth of more 
than one centimetre’. They also tracked the 
microbubbles, to estimate blood-flow velocity. 
However, their system acquired images at the 
low rate of 25 frames per second, which meant 
that hour-long imaging times were needed to 
achieve super-resolution. 

Errico et al. used a different approach that 
dispensed with the need for a dilute micro- 
bubble solution. By using a high-frame-rate 
imaging system (500 frames per second), 
they were able to detect the waves scattered 
from individual microbubbles by comparing 
sequential images. Signals from bubbles that 
have disintegrated or moved significantly in 
the time between frames can be detected in the 
data, and — as long as these changes are sparse 
enough to be spatially separated from one 
another — the positions and velocities of these 


microbubbles can be accurately determined*” 
(Fig. 1). The authors compared 75,000 images 
taken over about 150 seconds to build up 
super-resolution images of the vasculature 
in the cortical region of rat brains, through 
both intact rat skulls and skulls that had been 
thinned to reduce the acoustic attenuation. 

Could this technique be translated to a 
clinical setting? At the ultrasound wave- 
lengths used by Errico and colleagues, over- 
coming the attenuating effect of the thick 
human skull will present a considerable 
challenge. The authors point out that it might 
be possible to circumvent this problem by 
using longer wavelengths, which are less 
severely attenuated. Nonetheless, imaging of 
less-challenging targets that do not require 
the ultrasound waves to pass through thick 
bone should be readily achievable. One dis- 
advantage of the new approach compared with 
conventional ultrasound imaging is the need 
to administer a contrast agent; this requires an 
intravenous cannula and can increase clinical 
scanning time. 

Super-resolution ultrasound imaging of 


Transition loses 
its invasive edge 


Two studies provide evidence that epithelial tumour cells do not need to 
transition to a mesenchymal-cell state to form metastases, but that this process 
does contribute to drug resistance. SEE ARTICLE P.472 & LETTER P.525 


SHYAMALA MAHESWARAN 
& DANIEL A. HABER 


ancer often becomes lethal only when 
cells from the primary tumour dis- 
seminate to another organ. The early 
steps of this highly complex process, called 
metastasis, have been thought’ to rely on 
non-motile epithelial tumour cells acquiring 
characteristics of mesenchymal cells, which are 
more migratory. This change is known as the 
epithelial-to-mesenchymal transition (EMT). 
The migrating cancer cells then undergo a 
reverse mesenchymal-to-epithelial transition 
when they seed a secondary tumour’. Meta- 
stases therefore display the same epithelial-cell 
predominance as primary cancers, leaving no 
evidence of their transient mesenchymal state. 
Now, two papers in this issue (Fischer et al.’ 
and Zheng et al.*) present data that challenge 
the role of EMT as a crucial effector of cancer 
metastasis. 
In terms of cancer-cell characteristics, the 
epithelial lineage is associated with increased 
proliferation, whereas the mesenchymal 


lineage is linked to enhanced avoidance of 
anoikis (a form of death that occurs when cells 
detach from their normal tissue matrix) and 
of drug-induced death. Current understand- 
ing of EMT-induced metastasis is derived 
from in vitro data and mouse models of 
human cancers. But the inherent plasticity of 
the process and the limited clinical evidence 
supporting the occurrence of EMT in tumour 
specimens~ have led to scepticism about 
EMT being the predominant mechanism 
governing the early steps of metastasis. EMT 
is, however, emerging as one of numerous 
mechanisms conferring resistance to various 
cancer therapies®. 

Using mouse models of mammary tumours, 
Fischer and co-workers (page 472) surveyed 
the fate of epithelial tumour cells transition- 
ing to a mesenchymal state, from the cells’ 
inception and dissemination through the 
bloodstream to their exit from blood vessels 
and metastatic growth. To do this, the authors 
monitored the expression of green fluorescent 
protein (GFP) as a proxy for the expression 
of the genes that encode fibroblast-specific 
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microvasculature is an exciting prospect. The 
technique has the potential to substantially 
advance the study of normal blood-vessel 
function, as well as disease. Moreover, it might 
enable doctors to readily identify microvessel- 
related disorders, such as tumour-related 
vessel growth and microvascular abnormalities 
in deep abdominal organs such as the kidneys, 
and to assess cardiovascular disease. m 
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protein 1 (FSP1) or vimentin, which are 
triggered when epithelial tumour cells switch 
to a mesenchymal state. The green fluores- 
cence persists in the progeny of these cells well 
after they revert to an epithelial fate. 

In two mouse models, the authors found 
evidence for EMT in a minor fraction of cells in 
the primary tumour and in a subset of circulat- 
ing tumour cells. However, the vast majority of 
metastatic tumours were not derived from the 
mesenchymal-switched cells expressing GFP, 
but from disseminating epithelial cells (Fig. 1). 
The researchers further show that inhibiting 
expression of the genes ZEB1 and ZEB2, which 
locks tumour cells in the epithelial state, did 
not impair metastasis of the mouse mammary 
tumours to the lung. 

Zheng et al. (page 525) reached a similar 
conclusion using tissue-specific deletion of 
the EMT-inducing transcription factors Snail 
or Twist to assess the consequences of EMT 
in a mouse model of pancreatic cancer. They 
found that loss of either Snail or Twist in the 
pancreatic epithelium does not affect tumour 
formation or overall survival, but it does sup- 
press EMT in the primary tumour. Despite 
the lower frequency of cells expressing mes- 
enchymal marker proteins in these tumours 
compared with those in mice in which Snail 
and Twist are expressed at normal levels, there 
were a similar number of metastases in the 
liver, lungs and spleen of these mice. 

Although they question the role of EMT 
in metastatic dissemination, both research 
groups go on to conclude that EMT con- 
tributes to drug resistance. In Fischer and 
colleagues’ mammary-tumour model, treat- 
ment with the drug cyclophosphamide led 
to enhanced survival and proliferation of 
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Figure 1 | Metastatic potential. A small fraction of epithelial cells ina 
solid tumour acquires mesenchymal-cell characteristics during tumour 
progression, through a process known as epithelial-to-mesenchymal 
transition (EMT). Both epithelial and mesenchymal cancer cells can invade 
the bloodstream and exit it at distant sites, where the mesenchymal cells 
undergo a reverse mesenchymal-to-epithelial (MET) transition. Contrary 


EMT-switched tumour cells that had reached 
the lung. Analysis of the genes being expressed 
in these cells correlated this resistance with 
increased expression of genes encoding drug- 
metabolizing enzymes and drug-transporter 
proteins. These findings were mirrored in 
Zheng and colleagues’ Snail- or Twist-deleted 
pancreatic cancers, in which tumour cells with 
epithelial characteristics expressed higher 
levels of nucleoside-transporter proteins than 
did mesenchymal cells, potentially render- 
ing the epithelial cells more sensitive to the 
chemotherapeutic drug gemcitabine. 

These findings challenge the prevailing 
hypothesis that EMT is a key element in the 
metastatic dissemination of epithelial cancers, 
and they point to a distinct role of this cell-fate 
transition in enhancing cancer-cell survival 
during drug treatment. How can these data be 
reconciled with compelling previous reports 
on the role of EMT in metastasis? The earlier 
work studied the effects of EMT induced by 
the growth factor TGF® or by overexpression 
of transcriptional regulators such as Snail or 
Twist. Such approaches might not capture the 
physiological process that occurs spontane- 
ously in cancer cells as accurately as do the 
methods used in the current studies. It is prob- 
able that the induction of EMT that occurs 
during the natural progression of a cancer 
may be more subtle than the full EMT switch 
that is induced by the expression of powerful 
regulators and that has been associated with 
high levels of metastasis. 

However, the limitations of the models used 
by Fischer et al. and Zheng et al. need to be 
considered before EMT-mediated tumour 
invasion can be dismissed outright. EMT is 
orchestrated by complex circuitry involving 
multiple signalling molecules and transcription 
factors. Tracing switched cells on the basis of 
expression of a single gene may therefore not 
fully capture these complicated features. Simi- 
larly, Snail and Twist function redundantly in 
many settings””®, and inactivation of both (and 
potentially of other transcriptional regulators) 
simultaneously, rather than individually, may 
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be required to abrogate EMT. Furthermore, 
cancer is a highly variable disease, and its full 
complexity cannot be completely captured in 
mouse models that are driven by expression of 
a few cancer-initiating genes. Nonetheless, the 
conclusions reached by the two studies warrant 
a re-evaluation of the role of EMT in cancer 
progression. Alternative ways in which epithe- 
lial cells could enter the bloodstream without 
acquiring mesenchymal properties, such as 
collective epithelial-cell migration’ or tumour 
fragmentation”, are worth investigating. 

The postulated role of EMT in mediating 
cancer-cell survival is reinforced by the two 
latest studies. Indeed, EMT has been linked 
to drug susceptibility of cancer cells, as well as 
to their entrance into a non-proliferative state 
in which they have stem-cell-like properties”. 
Understanding the many cellular pathways 
that together determine these cell fates, and 
how these pathways are modulated, is likely to 
provide fertile ground for drug discovery and 
for new therapeutic strategies. m 
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to previous opinion, Fischer et al.* and Zheng et al.’ find that the majority of 
metastatic tumours at secondary sites are initiated by epithelial cells from the 
primary tumour, and not by cells that have undergone EMT and subsequent 
MET. However, both research groups also show that such transitioned cells are 
more resistant to chemotherapeutic drugs than are untransitioned epithelial 
cells and emerge as the dominant metastatic population following treatment. 
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Hidden reservoirs 


West Africa’s Ebola epidemic continues to reveal surprises. Although the animal 
species that originally passed the virus to people remains a mystery, a virus 
reservoir and persistent disease have been identified in some human survivors. 


JONATHAN L. HEENEY 


nimals are reservoirs for many patho- 
gens that occasionally jump species 
and infect humans. In December 2013 
in the forests of Guinea, a two-year-old boy 
became infected with the Zaire strain of Ebola 
virus from an unidentified animal source’. 
This event triggered the largest and longest 
human epidemic of Ebola viral infection in 
recorded history. Across several countries in 


West Africa, over 28,000 people were infected 
and more than 11,000 died. This fatality rate of 
less than 50% was lower than in most previous 
outbreaks, and it left more than 16,000 survi- 
vors’. Studies of these survivors are changing 
our understanding of Ebola virus infection and 
raising concern for the long-term well-being 
of these individuals and their communities. 
Writing in the New England Journal of Medi- 
cine, Deen et al.’ reveal that Ebola virus RNA 
can persist in the semen of men for months 
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Figure 1 | Ebola infection dynamics in animals and humans. Ebola virus has been identified in several 
animal species, including bats, chimpanzees and forest antelopes. Transmission to humans can occur 
directly from reservoir species, in which the virus may persist without causing active infection, or from 
amplifying host species, in which the virus replicates to high levels, often causing illness and death. Most 
infected people develop acute Ebola virus disease and are highly infectious, although some individuals 
survive exposure and infection without developing symptoms. There is also growing evidence™ that the 
virus can persist in the central nervous system and reproductive organs of some survivors of the disease, 
with the possibility that these survivors could infect others months after resolution of their acute symptoms’. 


after their recovery from the disease, and Mate 
et al.* demonstrate that such persistence can be 
the source of new infections through sexual 
transmission. 

Deen and colleagues obtained semen samples 
from 93 Sierra Leonian men who had survived 
Ebola virus disease (EVD) at various intervals 
after the onset of their disease. Although the 
authors find that the proportion of men whose 
semen contained Ebola virus RNA waned with 
time, the viral genomes persisted for as long 
as 7-9 months after recovery. Mate and col- 
leagues provide convincing evidence that a 
female patient in Liberia, who subsequently 
died, had contracted Ebola virus through 
unprotected vaginal intercourse with her male 
partner, who had survived EVD. 

These observations support similar findings 
in previous epidemics of filoviruses, the virus 
family to which Ebola belongs. There have 
been reports”® of the persistence of Marburg 
virus in the anterior chamber of the eye and 
semen of human survivors, and of the persis- 
tence of Ebola virus in the semen of men who 
survived the 1995 outbreak in the Democratic 
Republic of the Congo’. This has obvious 
implications for sexual partners. 

The fact that Ebola virus is found at high levels 
in placental tissues also suggests that transmis- 
sion could occur from pregnant women who 
survive EVD to their babies, although pregnant 
women who become infected usually abort the 
fetus before term*. Mother-to-child transmis- 
sion by breastfeeding in survivors of Marburg 
virus has been reported’, and the potential for 
transmission through breast milk has also been 
suggested for Ebola”. 

Although the relative risk of virus 
transmission from survivors is low compared 


with transmission from patients with acute 
EVD, a single case of new infection is sufficient 
to trigger an epidemic (Fig. 1). Thus, there is 
a strong need for rigorous assessment of the 
tissue reservoirs of Ebola in human survi- 
vors and the associated public-health risks. 
Follow-up health care should be combined 
with compassionate education of survivors 
and their communities by qualified and 
knowledgeable personnel, including advice on 
condom use. 

Another lesson to emerge from this 
epidemic is that some survivors experience 
symptoms after their recovery from the main 
disease episode, suggesting that viral persis- 
tence in certain compartments of the body is 
more serious in some survivors than previ- 
ously recognized. Reported symptoms include 
blurred vision, pain behind the eyes, hearing 
deficits, painful swallowing, joint pain, fever, 
memory loss and difficulty in sleeping’’”. 
The rehospitalization of a British nurse who 
developed neurological complications more 
than 9 months after surviving acute EVD" isa 
chilling indication that the virus can persist in 
the central nervous system and be triggered to 
reactivate or to escape immune surveillance, 
or both. Fortunately, diagnosis and success- 
ful clinical intervention were possible for the 
nurse in Britain, but this situation is unlikely 
in most communities in West Africa. 

The existence of a reservoir state in human 
Ebola survivors is now beyond debate. But 
we do not know how long viable virus can 
persist in these tissue reservoirs, nor whether 
the virus replicates there at low levels or is 
dormant and then triggered to replicate. Better 
definition and understanding of the reservoirs 
and the underlying mechanisms of post-EVD 
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symptoms are needed to inform clinical 
management and treatment. 

For example, studies of survivors may 
identify features of their immune responses 
(such as neutralizing-antibody determinants) 
that correlate with either full viral clearance or 
the persistence of viral reservoirs. Such corre- 
lates may enable survivors to be classified into 
‘carrier or ‘cleared’ subtypes. Potential factors 
that could predispose survivors to viral re- 
emergence also need to be taken into account, 
including genetics, compromised immunity 
owing to poor health, concurrent infections 
such as HIV, or use of immunosuppressive 
drugs. However, Ebola, like other RNA viruses, 
may be prone to mutational changes, and virus 
escape from the host’s immune response may 
eventually occur even without predisposing 
factors. 

It is also not clear how, or whether, post- 
EVD immunity is affected by the stage of 
treatment or type of therapy given, such as 
monoclonal antibodies or the antibodies in 
convalescent plasma. As well as helping to 
classify survivors, enhanced understanding of 
viral persistence will help to guide therapeu- 
tic choices — treatment with small antiviral 
molecules, for example, may facilitate full 
clearance of the virus. 

Although we are learning much about 
Ebola from this epidemic, we have yet to 
identify the events that caused the virus to 
jump to the Guinean boy almost two years 
ago. The consumption of bushmeat has been 
associated with previous epidemics, and 
some bushmeat species, such as great apes 
and forest antelopes, are susceptible to high 
levels of Ebola-virus replication and die from 
the infection. They are thus best considered 
as amplifying hosts, rather than the initial 
reservoir species (Fig. 1). Prime suspects for 
the reservoir include several species of bat, 
although a bat source has not been confirmed 
for this latest epidemic". Indeed, the animal 
reservoirs of Ebola may be cloaked by seques- 
tration of the virus in much the same way as 
its persistence in human survivors, waiting 
for physiological triggers for transmission to 
unexposed animals of the same species or to 
amplifying hosts. 

Understanding the triggers of Ebola emer- 
gence, the persistence of the virus in humans 
and the infection dynamics in its animal reser- 
voirs is vital not only for the long-term care of 
survivors of this epidemic, but also for prevent- 
ing the next one. m 
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The Moon’s tilt for gold 


The Moon’s current orbit is at odds with theories predicting that its early orbit 
was in Earth’s equatorial plane. Simulations now suggest that its orbit was tilted 
by gravitational interactions with a few large bodies. SEE LETTER P.492 


ROBIN CANUP 


impact with Earth is thought to have 

created an Earth-orbiting disk of debris 
that coalesced to form the Moon. ‘Inelastic’ 
collisions between such debris would dissi- 
pate energy and remove relative up-and-down 
motions, so that the Moon that assembled from 
these collisions would orbit approximately 
in Earth’s equatorial plane. Yet the Moon's 
current orbit implies that its initial orbit 
was substantially inclined relative to Earth’s 
Equator’, a troubling contradiction. 

On page 492 of this issue, Pahlevan and 
Morbidelli’ identify a compelling and simple 
solution to this problem — that the Moon’s 
early orbit was gravitationally jostled into a 
tilted state by close passes of large objects left 
over from the formation of the inner plan- 
ets. The existence of a population of these 
objects could also explain how elements such 
as iridium, platinum and gold were deliv- 
ered to Earth's outer layers after the Moon 
formed’, 

The Earth-Moon pair is a dynamically 
coupled system. The Moon’ gravity raises tides 
on Earth, most notably in the oceans, and grav- 
itational interactions between these tides and 
the Moon is causing Earth's rotation to slow 
as the Moon’s orbit expands. Tidal interac- 
tions also reduce the tilt of the Moon’s orbit 
relative to a preferred plane. This would have 
coincided with Earth's equatorial plane when 
the early Moon was orbiting close to Earth 
and transitioned to the plane of Earth’s orbit 
around the Sun as the Moon's orbit expanded. 
In the absence of other effects, the current 
5° inclination of the Moons orbit relative 
to Earth's orbital plane implies an initial 10° 
inclination relative to Earth’s equatorial plane 
when the Moon formed’, 10 times larger than 
expected according to theory’. 

A seemingly unrelated — until now — set 
of clues about the conditions soon after the 


FE our and a half billion years ago, a giant 


Moon formed emerge from the abundance of 
precious metals in the Earth. Elements such 
as platinum and gold are highly siderophile, 
which means that they have strong chemical 
affinities for iron. Because Earth formed ina 
largely molten state, high-density iron would 
have readily sunk to the planet’s centre to form 
acore, taking highly siderophile elements with 
it and efficiently removing these from Earth’s 
upper layers. The fact that we find such ele- 
ments in relatively high abundance in rocks at 
Earth's surface suggests that they were deliv- 
ered to the planet after the end of core forma- 
tion, through a ‘late veneer’ of material that 
added about the last 1% of Earth’s mass”. 

If Earth’s late veneer was delivered by a large 
number of small impactors, the Moon would 
have received about 1/20th as many impactors 
on the basis of its smaller cross section’. But 
lunar siderophile abundances imply that the 
Moon received much less than that amount. 
It thus seems probable that Earth’s late veneer 
was delivered by only a few large impactors, 
each roughly comparable in size to the Moon, 
because the Moon would have received less 
than its proportionate share under these 
circumstances. 

Pahlevan and Morbidelli use computa- 
tional methods (Monte Carlo simulations) 
to consider the effects of such a population 
of large, late-accreting background objects 
on the Moons early orbit. Their simulations 
begin with a Moon orbiting in Earth’s equa- 
torial plane close to our planet (Fig. 1). With 
time, the Moon's orbit expands because of tidal 
interaction with Earth, and is gravitationally 
perturbed by the background objects until this 
population is depleted over typically a few tens 
of millions of years. 

Central to the new work is the recogni- 
tion that each object that ultimately collides 
with Earth first undergoes many thousands 
of non-collisional close passes, a portion of 
which strongly perturb the Moon's orbit. An 
object approaching the Moon from a random 
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50 Years Ago 


The use of rubber gloves during 
surgical operations became 

general about 1900 ... The object 
of an investigation was to obtain 

an estimation of how frequently 
wound infection originates from 
bacteria on the hands of operating 
staff... Examination of the wounds 
following 433 ‘clean’ operations, 

of the 3,125 rubber gloves used 

in those operations and of the 
bacterial flora of the hands which 
had worn 692 damaged gloves, 
revealed no connexion between the 
glove damage, the bacterial flora 
and the wound infections observed. 
From Nature 27 November 1965 


100 Years Ago 


The Times of November 20 
published a rather flamboyant little 
article, headed “A Surgical Schism? 
This article said: “Not for halfa 
century at least has the medical 
world been so sharply divided as it 
is to-day in regard to the question 
of the treatment of wounds.’ Now, 
it is exactly halfa century since 
Lister ... first ventured to treat a 
compound fracture by plugging the 
wound with a strip of rag soaked 

in undiluted and impure German 
creasote. Pyaemia and septicaemia 
and erysipelas were ravaging the 
wards of the old Glasgow Infirmary, 
and he, relying on Pasteur’s work 
on the “germs of putrefaction,” 

and knowing that creasote was 

a good “disinfectant,” plugged 

a wound with it. That was the 
beginning of everything, exactly 
half a century ago. To-day, there 
are many methods, but they do 

not all contradict or exclude each 
other ... We must notimagine a 
sort of desperate squabble among 
our military surgeons... The 
suggestion in the Times article that 
an acute controversy is proceeding 
upon these matters is unfortunate 
and misleading. 

From Nature 25 November 1915 
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Equatorial 


Figure 1 | Collisionless interactions could have altered the early Moon’s orbit. a, When the Moon first 
formed, its orbit was approximately in the plane of Earth’s Equator. Over time, its orbit then expanded. 

b, Pahlevan and Morbidelli’” propose that collisionless interactions with large objects passing through 

the early Solar System would have strongly perturbed the Moon's orbit. c, The cumulative effect of such 
interactions would have tilted the Moon’s orbital plane sufficiently to explain the current inclination of the 
Moon to Earth’ orbital plane around the Sun. The Moon’s orbital radius in b and c is not shown to scale. 


direction may increase or decrease the Moon's 
orbital tilt. But just as a series of steps, each 
equally likely to be forward or backward, 
causes the standard deviation in the net 
distance travelled to increase with time, so 
too does a series of randomly oriented kicks 
to the lunar orbit lead to a general increase 
with time in the probability of exciting a 
minimum tilt. 

Pahlevan and Morbidelli’s results show a 
high likelihood that such random scattering 
events can cumulatively produce the neces- 
sary early tilt in the Moon’s orbit, as long as the 
number of objects that deliver the final approx- 
imately 1% of Earth’s mass is small (fewer 
than 5) and the rate of early tidal expansion 
of the Moon's orbit is sufficiently rapid. The 
rate of early tidal expansion needed is broadly 
consistent with the average tidal properties 
inferred for Earth on the basis of the expan- 
sion of the Moon's orbit to its current orbital 
distance. However, the specific values that 
would have applied to the earliest Earth remain 
uncertain. 

The magnitude of the excited tilt scales 
roughly linearly with the late mass deliv- 
ered to Earth. It is not known what fraction 
of the siderophiles that were concentrated in 
the cores of such large impactors would have 
been retained in Earth's upper layers. Improved 
models of late-veneer impacts should therefore 


be used to better constrain the late-accreted 
mass; this would in turn allow a closer approxi- 
mation of the inclination expected from scat- 
tering. Moreover, the new scattering model 
is most effective ifthe Moons inclination has 
been damped only by tides. If other forms 


of inclination damping have occurred, then 
— depending on the timing of this damp- 
ing — the required initial inclination might 
increase, and with it the required mass of 
background objects, perhaps to unrealistically 
high values. 

Previously reported models for the origin 
of the Moon’s inclination rely on more- 
complex processes involving either a periodic 
gravitational interaction (gravitational reso- 
nance) with the Sun’ or a resonant interaction 
between the Moon and its precursor disk’. 
Both require rather narrow sets of conditions 
for success. The new mechanism is simpler 
than these models, and the population of late 
lunar-sized objects that it requires is compel- 
lingly consistent with that needed to account 
for the delivery of Earth’s precious metals, a 
completely independent constraint. Had such 
a population of objects not existed, the Moon 
might be orbiting in Earth’s orbital plane, with 
total solar eclipses occurring as a spectacular 
monthly event. But our jewellery would be 
much less impressive — made from tin and 
copper, rather than from platinum and gold. = 
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Assassins of eyesight 


A molecular cascade involving the transcription factor SIX6 and its target 
gene p16INK4a causes the death of neurons that link the eye to the brain. This 
discovery deepens our understanding of a common form of blindness, glaucoma. 


ANDREW D. HUBERMAN & RANA N. EL-DANAF 


ision might feel easy, but an immense 

number of neurons are required to 

perform routine visual functions, such 
as reading, navigating the street or recogniz- 
ing faces. Tightly lining the back of the eye is 
a layer of approximately 1 million neurons 
called retinal ganglion cells (RGCs), which take 
information encoded by the retina and pass it 
to the brain’. Glaucoma — a disease marked 
by progressive, irreversible degeneration of 
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RGCs — is a common form of blindness, 
affecting more than 60 million people world- 
wide’. Although many studies have sought to 
understand the cellular and molecular basis of 
glaucoma’ , the mechanisms that drive RGC 
death in this debilitating disease have remained 
mysterious. But writing in Molecular Cell, 
Skowronska-Krawczyk et al.* report that certain 
glaucoma-associated mutations in humans 
are linked to a defined molecular pathway that 
accelerates RGC ageing and death. 

A constellation of risk factors has been 


associated with glaucoma, one of the greatest 
of which is age. Like other forms of neuro- 
degeneration, loss of RGCs occurs more 
often in people over 60, raising questions 
about whether similar mechanisms might 
underlie glaucoma and other age-related 
neurodegenerative disorders such as 
Alzheimer’s disease’. There also seems to be a 
strong genetic component to glaucoma, with 
certain forms occurring four to five times more 
frequently in dark-skinned people’. Finally, 
the disease is often thought to be caused by 
elevated fluid pressure inside the eye. How- 
ever, abnormally high intraocular pressures are 
neither 100% predictive of nor a prerequisite 
for glaucoma, and many people with the dis- 
ease have normal eye pressures’. This broad 
range of risk factors has led many to speculate 
that glaucoma is caused by a variety of indivi- 
dual stressors that all increase RGC susceptibil- 
ity to death. The key questions have therefore 
become: what are the common molecular 
pathways that trigger RGC loss, and how could 
those pathways be manipulated for therapies? 

Skowronska-Krawczyk et al. analysed 
genetic-association studies in several human 
populations to find genes that are commonly 
mutated in people with primary open-angle 
glaucoma (the most common form of the 
disease). One screen picked up SIX6, which 
encodes a transcription factor that helps to 
shape the eye during embryonic and postnatal 
development’. A mutation called His141, which 
changes amino-acid residue 141 of the SIX6 
protein from asparagine to histidine, confers 
a risk of glaucoma. The authors performed a 
careful structural analysis, which revealed that 
this residue probably lies outside the transcrip- 
tion factor’s DNA-binding domain. Instead, 
the mutation might affect the ability of SIX6 to 
interact with other transcription factors or with 
co-factor proteins, altering the efficiency with 
which the protein can activate its target genes. 

To identify possible target genes for SIX6, 
Skowronska-Krawcezyk and colleagues again 
turned to genetic-association studies. These 
indicated that mutations in the pl6INK4a 
gene are a strong risk factor for glaucoma. 
The authors found that expression of both 
p16INK4a and SIX6 was higher in eyes of 
people with glaucoma than in those of healthy 
people. Moreover, they demonstrated that 
SIX6 binds to and activates p 16INK4a. 

In many cell types, p16INK4a is associated 
with a cellular ageing process called senes- 
cence. Skowronska-Krawczyk et al. found that 
approximately four times more RGCs were 
senescing in patients with glaucoma than in 
healthy people. To probe this pathway further, 
the authors engineered human retinal progeni- 
tor cells cultured in vitro to express the SIX6 
His141 mutation. The mutant protein strongly 
upregulated p16INK4a and another marker of 
cellular senescence, the IL-6 gene. This effect 
seems to be specific to the His141 mutation, 
because upregulation of these markers did 


p16INK4a 
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Figure 1 | Molecular pathways that underlie glaucoma. Age, elevated pressure in the eye and certain 
genetic mutations are all associated with an increased risk of glaucoma, a form of blindness linked to the 
degradation of retinal ganglion cells (RGCs). Skowronska-Krawczyk et al.’ report that these risk factors 
converge on a single molecular cascade in which the transcription factor SIX6 binds to and activates 

the gene p16INK4a. Increased p16INK4a expression causes RGC senescence and, eventually, RGC 


degradation and death. 


not occur in cells producing wild-type SIX6 
or forms of SIX6 mutated at different residues. 
Together, the results indicate that the His141 
mutation increases the effectiveness with 
which SIX6 activates p 16INK4a and triggers 
senescence pathways in RGCs. 

Skowronska-Krawczyk and colleagues next 
explored whether activation of p16INK4a 
was linked to RGC ageing or death in mice in 
which intraocular pressure was experimen- 
tally raised. They found that expression of 
both SIX6 and p16INK4a increased markedly 
after experimental elevation of intraocular 
pressure. The evidence for an interaction 
between SIX6 and p16INK4a was further 
bolstered by the discovery that p16INK4a 
expression was reduced in mice lacking SIX6, 
and that elevated intraocular pressure increased 
SIX6-p16INK4a binding in wild-type mice. 
As in human glaucomatous retinas, increases 
in intraocular pressure dramatically elevated 
the number of senescent RGCs. Together, 
these results suggest that increased p16INK4a 
expression is a major cause of cellular- 
senescence pathways that ultimately lead to 
RGC degeneration and death in glaucoma. 

In a final set of experiments, the authors 
performed a crucial test of this model by assess- 
ing whether genetic deletion of p16INK4a or 
partial deletion of SIX6 impeded RGC death 
in a mouse model of glaucoma. Remarkably, 
when intraocular pressure was experimentally 
increased in either of these genetically mutated 
mouse strains, RGCs resisted death, strongly 
supporting the idea that SIX6-activated 
increases in p16INK4a mediate RGC loss in 
response to different stressors (Fig. 1). 

Skowronska-Krawczyk and colleagues’ 
study is an important step forward. First, it 
provides support for the long-held view that, 
even though different risk factors and stressors 
can increase the likelihood of glaucoma, there 
isa common molecular mechanism by which 
those stressors act to kill RGCs. Second, the 
study indicates that cellular senescence and 
its associated pathways are precursors to RGC 
degeneration and death. 

Over the past few years, there has been a 
surge in our understanding about which RGCs 


are most vulnerable in early-stage glaucoma®”, 
and of the ion channels required to trans- 
late intraocular pressure increases into RGC 
degradation and death". The current study pro- 
vides a solid molecular foundation on which to 
integrate these findings. A more complete 
understanding of the biological underpinnings 
of glaucoma will no doubt also help to identify 
new targets for intervention, and might reveal 
mechanistic insights into the molecular basis of 
other age-related neurodegenerative diseases, 
such as Alzheimer’s and Parkinson's disease. = 
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CORRECTION 

The News & Views article ‘Rehabilitation: 
Boost for movement’ by Randolph J. Nudo 
(Nature 527, 314-315; 2015) omitted 

to mention that the author has declared 
competing financial interests. Details are 
available in the online version of the article. 
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Hemichordate genomes and 
deuterostome origins 
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Acorn worms, also known as enteropneust (literally, ‘gut-breathing’) hemichordates, are marine invertebrates that 
share features with echinoderms and chordates. Together, these three phyla comprise the deuterostomes. Here we 
report the draft genome sequences of two acorn worms, Saccoglossus kowalevskii and Ptychodera flava. By comparing 
them with diverse bilaterian genomes, we identify shared traits that were probably inherited from the last common 
deuterostome ancestor, and then explore evolutionary trajectories leading from this ancestor to hemichordates, 
echinoderms and chordates. The hemichordate genomes exhibit extensive conserved synteny with amphioxus and other 
bilaterians, and deeply conserved non-coding sequences that are candidates for conserved gene-regulatory elements. 
Notably, hemichordates possess a deuterostome-specific genomic cluster of four ordered transcription factor genes, the 
expression of which is associated with the development of pharyngeal ‘gill’ slits, the foremost morphological innovation 
of early deuterostomes, and is probably central to their filter- feeding lifestyle. Comparative analysis reveals numerous 
deuterostome-specific gene novelties, including genes found in deuterostomes and marine microbes, but not other 
animals. The putative functions of these genes can be linked to physiological, metabolic and developmental specializations 


of the filter-feeding ancestor. 


The prominent pharyngeal gill slits, rigid stomochord, and midline 
nerve cords of acorn worms led 19th century zoologists to designate 
them as ‘hemichordates’ and group them with vertebrates and other 
chordates'~“, but their early embryos and larvae also linked them to 
echinoderms”*. Current molecular phylogenies strongly support the 
affinities of hemichordates and echinoderms as sister phyla, together 
called ambulacrarians’, and unite ambulacrarians and chordates within 
the deuterostomes (see glossary in Supplementary Note 1). Of all the 
shared derived morphological characters proposed between hemichor- 
dates and chordates, the pharyngeal gill slits have emerged with unam- 
biguous morphological and molecular support, notably the shared 
expression of the pax1/9 gene! These structures were ancestral 
deuterostome characters elaborated upon the bilaterian ancestral body 
plan, but the gill slits were subsequently lost in extant echinoderms and 


amniotes’'. Since extant invertebrate deuterostomes use this apparatus 
for efficient suspension and/or deposit feeding, the early Cambrian or 
Precambrian deuterostome ancestor probably also shared this lifestyle. 
This perspective on the last common deuterostome ancestor informs 
our understanding of the subsequent evolution of hemichordates, 
echinoderms and chordates’”!?"!®, 

Hemichordates share bilateral symmetry, gill slits, soft bodies and 
early axial patterning with chordates, making them key comparators 
for inferring the ancestral genomic features of deuterostomes. To 
this end, we sequenced and analysed the genomes of acorn worms 
belonging to the two main lineages of enteropneust hemichordates 
(Supplementary Note 1): Saccoglossus kowalevskii (Harrimaniidae; 
Atlantic, North America, Fig. 1a) and Ptychodera flava (Ptychoderidae; 
Pacific, pan-tropical, Fig. 1b). Both have characteristic three-part 
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Figure 1 | Hemichordate model systems and their embryonic 
development. The hemichordate phylum includes the enteropneusts 
(acorn worms) and pterobranchs (minute, colonial, tube-dwelling; not 
shown). a, c, Saccoglossus kowalevskii (Harrimaniid (direct developing) 
enteropneust) adult (a) and juvenile (c) with gill slits. b, d, Ptychodera 


bodies comprising proboscis, collar and trunk, the last with tens to 
hundreds of pairs of gill slits. While S. kowalevskii develops directly to a 
juvenile worm with these traits within days (Fig. 1c, e), P flava develops 
indirectly through a feeding larva that metamorphoses to a juvenile 
worm after months in the plankton (Fig. 1d, e). Our analyses begin 
to integrate macroscopic information about morphology, organismal 
physiology, and descriptive embryology of these deuterostomes with 
genomic information about gene homologies, gene arrangements, gene 
novelties and non-coding elements. 


Genomes 

We sequenced the two acorn worm genomes by random shotgun meth- 
ods with a variety of read types (Methods; Supplementary Note 2), 
each starting from sperm from a single outbred diploid individual. The 
haploid lengths of the two genomes are both about 1 Gbp (Extended 
Data Fig. 1), but differ in nucleotide heterozygosity. Both acorn worm 
genomes were annotated using extensive transcriptome data as well 
as standard homology-based and de novo methods (Supplementary 
Note 3). Counting gene models with at least one detectable orthologue 
in another sequenced metazoan species, we find that Ptychodera and 
Saccoglossus encode at least 18,556 and 19,270 genes, respectively 
(Methods). Additional de novo gene predictions include divergent and/ 
or novel genes (Extended Data Fig. 1). Despite the ancient divergence 
of the Saccoglossus and Ptychodera lineages (more than 370 million 
years ago, see below) and their different modes of development, the 
two acorn worm genomes have similar bulk gene content, as discussed 
later (Extended Data Fig. 2 and Supplementary Note 4), and similar 
repetitive landscapes (Supplementary Note 5). 


Deuterostome phylogeny 

Deuterostome relationships were originally inferred from develop- 
mental and morphological characters”*!” and these hypotheses were 
later tested and refined with molecular data®’. Aspects of deuterostome 
phylogeny continue to be controversial, however, notably the position 
of the sessile pterobranchs among hemichordates, and the surprising 
association of Xenoturbella'® and acoelomorph flatworms with ambu- 
lacrarians’’ proposed by some studies. We explored these issues using 
genome-wide analyses of the newly sequenced hemichordate genomes 
augmented with extensive new RNAseq from five echinoderms, three 
additional hemichordates (including a rhabdopleurid pterobranch) and 
two acoels (Fig. 2, Extended Data Fig. 3, Methods and Supplementary 
Note 6). We recovered the monophyly of hemichordates, echino- 
derms, ambulacrarians and deuterostomes, using not only amino acid 
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characters but also presence—absence characters for introns and coding 
indels (Supplementary Note 4). Our analyses also placed pterobranch 
hemichordates as the sister-group to enteropneusts’ rather than within 
them”. These phylogenetic analyses imply that genomic traits shared 
by chordates and ambulacrarians can be attributed to the last common 
deuterostome ancestor (see below). Using a relaxed molecular clock, 
we estimate a Cambrian origin of hemichordates (Methods, Extended 
Data Fig. 3 and Supplementary Note 6). 

We also performed several analyses to assess the controversial rela- 
tionships between Xenoturbella, acoelomorphs and deuterostomes 
(Supplementary Note 6). With conventional site-homogeneous mod- 
els, acoels remain outside deuterostomes?°-?3 (Fig. 2, Supplementary 
Figs 6.1 and 6.2). Alternative models?*, however, show equivo- 
cal branching of acoels depending on the inclusion of the current 
sparse data for Xenoturbella (Supplementary Note 6). Notably, with- 
out Xenoturbella, acoels are positioned as a bilaterian sister group 
(Supplementary Fig. 6.3)”, Although we cannot rule out a deuteros- 
tome placement for Xenoturbella, our analyses generally do not support 


a grouping of acoels with deuterostomes’”. 


The gene set of the deuterostome ancestor 

By comparative analysis, we identified 8,716 families of homologous 
genes whose distributions in sequenced extant genomes imply their 
presence in the deuterostome ancestor (Methods; Supplementary Note 
4). Owing to gene duplication and other processes the descendants of 
these ancestral genes account for ~14,000 genes in extant deuteros- 
tome genomes including human (Supplementary Table 4.1.2). The dis- 
tributions of gene functions, domain compositions, and gene family 
sizes of hemichordates resemble those of amphioxus, sea urchin, and 
sequenced lophotrochozoans more than those of ecdysozoans; verte- 
brates also form a distinct group (Extended Data Fig. 2, Supplementary 
Note 4 and Supplementary Fig. 4.2). 

Exon-intron structures of genes are generally well conserved 
among hemichordates, chordates, and many non-deuterostome meta- 
zoans, allowing us to infer 2,061 ancestral deuterostome splice sites 
(Supplementary Note 4). Among orthologous bilaterian genes we found 
23 introns and 4 coding sequence indels present only in deuterostomes 
(shared between at least one ambulacrarian and chordate), suggesting 
that these shared derived characters may be useful to diagnose clade 
membership of new candidate organisms (Supplementary Note 4). 

Based on whole-genome alignments, we identified 6,533 con- 
served non-coding elements (CNE) longer than 50 bp that are found 
in all of the five deuterostomes Saccoglossus, Ptychodera, amphioxus, 
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Figure 2 | Phylogenetic placement of deuterostome taxa within the 
metazoan tree. Maximum-likelihood tree obtained with a super-matrix of 
506,428 amino-acid residues gathered from 1,564 orthologous genes in 52 
species (65.1% occupancy) and using a LG+T model partitioned for each 


sea urchin, and human (Methods; Supplementary Note 8). The iden- 
tified CNEs overlap extensively with human long non-coding RNAs 
(3,611 CNE loci; 55%, Fisher’s exact test P value < 2.2 x 101°). Those 
alignments usually do not exceed 250 bp (as has been reported among 
vertebrates’) and occur in clusters (Supplementary Note 8). Among 
these conserved sequences is a previously identified vertebrate brain 
and neural tube specific enhancer, located close to the sox14/21 ortho- 
logue in all five species”®. 


Conserved gene linkage 

Ancient gene linkages (‘macro-synteny””’) are often preserved in extant 
bilaterian genomes’”**. Comparative analysis revealed 17 ancestral 
linkage groups across chordates, including amphioxus and Ciona”’. 
While the contiguity of the draft of the sea urchin genome assembly”? 
is too limited to determine whether it shares this chromosome-scale 
organization, we find that the Saccoglossus genome clearly shares these 
chordate-defined linkage groups (Fig. 3a and Supplementary Note 7), 
implying that these chromosome-scale linkages were also present in 
the ancestral deuterostome. 

Ona more local scale, we find hundreds of tightly linked conserved 
gene clusters of three or more genes (‘micro-synteny’; Methods; 
Supplementary Note 7) including Hox*’ and ParaHox’! clusters in both 
acorn worms (Extended Data Fig. 4), as also found in echinoderms***?, 
Saccoglossus and amphioxus share more micro-syntenic linkages with 
each other than either does with sea urchin, vertebrates, or available pro- 
tostome genomes (Methods, Fig. 3b and Extended Data Figs 5 and 6). 
Conservation of micro-syntenic linkages can occur due to low rates of 
genomic rearrangement or, more interestingly, as a result of selection 
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to retain linkages between genes and their regulatory elements located 
in neighbouring genes”*® 


A deuterostome pharyngeal gene cluster 
One conserved deuterostome-specific micro-syntenic cluster with 
functional implications for deuterostome biology is a cluster of genes 
expressed in the pharyngeal slits and surrounding pharyngeal endo- 
derm (Fig. 4; Supplementary Note 9). This six-gene cluster contains 
four transcription factor genes in the order nkx2.1, nkx2.2, pax1/9 
and foxA, along with two non-transcription-factor genes slc25A21 
and mipol1, whose introns harbour regulatory elements for pax 1/9 and 
foxA, respectively*+*°, The cluster was first found conserved across 
vertebrates including humans (see chromosome 14; 1.1 Mb length from 
nkx2.1 to foxA1)***”, In S. kowalevskii, it is intact with the same gene 
order as in vertebrates (0.5 Mb length from nkx2. 1 to foxA), imply- 
ing that it was present in the deuterostome and ambulacrarian ances- 
tors. The full ordered gene cluster also exists on a single scaffold in 
the crown-of-thorns sea star Acanthaster planci. Since these genes are 
not clustered in available protostome genomes, there is no evidence 
for deeper bilaterian ancestry. Two non-coding elements that are con- 
served across vertebrates and amphioxus*® are found in the hemichor- 
date and A. planci clusters at similar locations (A2 and A4, in Fig. 4a). 
The pax 1/9 gene, at the centre of the cluster, is expressed in the phar- 
yngeal endodermal primordium of the gill slit in hemichordates, tuni- 
cates, amphioxus, fish, and amphibians*”, and in the branchial pouch 
endoderm of amniotes (which do not complete the last steps of gill slit 
formation), as well as other locations in vertebrates. The nkx2.1 (thy- 
roid transcription factor 1) gene is also expressed in the hemichordate 
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Figure 3 | High level of linkage conservation in Saccoglossus. 

a, Macro-synteny dot plot between Saccoglossus and amphioxus; each dot 
represents two orthologous genes linked in the two species, and ordered 
according to their macro-syntenic linkage. Amphioxus scaffolds are 
organized according to the 17 ancestral linkage groups (ALGs) inferred by 
comparison of the amphioxus and vertebrate genomes”’. Intersection areas 
of highest dot density are marked by numbers along the top of the plot, 
identifying each of the 17 putative ALGs. Axes represent orthologous gene 
group index along the genome. b, Branch-length estimation for loss and 
gain of synteny blocks with MrBayes, see Supplementary Note 7 for details. 
Short branches in hemichordates (in bold) indicate a high level of 
micro-syntenic retention in their genomes. 


pharyngeal endoderm in a band passing through the gill slit, but not 
localized to a thyroid-like organ*?. Here we also examined the expres- 
sion of nkx2.2 and foxA in S. kowalevskii. We find that nkx2.2, which 
is expressed in the ventral hindbrain in vertebrates, is expressed in 
pharyngeal ventral endoderm in S. kowalevskii, close to the gill slit 
(Fig. 4b), and that foxA is expressed throughout endoderm but 
repressed in the gill slit region (Fig. 4b). The co-expression of this 
ordered cluster of the four transcription factors during pharyngeal 
development strongly supports the functional importance of their 
genomic clustering. 

The presence of this cluster in the crown-of-thorns sea star, an 
echinoderm that lacks gill pores, and in amniote vertebrates that lack 
gill slits, suggests that the cluster’s ancestral role was in pharyngeal 
apparatus patterning as a whole, of which overt slits (perforations of 
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Figure 4 | Conservation of a pharyngeal gene cluster across 
deuterostomes. a, Linkage and order of six genes including the four genes 
encoding transcription factors Nkx2.1, Nkx2.2, Pax1/9 and FoxA, and two 
genes encoding non-transcription factors Slc25A21 (solute transporter) 
and Mipoll (mirror-image polydactyly 1 protein), which are putative 
‘bystander’ genes containing regulatory elements of pax 1/9 and foxA, 
respectively. The pairings of slc25A21 with pax1/9 and of mipoll with foxA 
occur also in protostomes, indicating bilaterian ancestry. The cluster is 
not present in protostomes such as Lottia (Lophotrochozoa), Drosophila 
melanogaster, Caenorhabditis elegans (Ecdysozoa), or in the cnidarian, 
Nematostella. SLC25A6 (the slc25A21 paralogue on human chromosome 
20) is a potential pseudogene. The dots marking A2 and A4 indicate two 
conserved non-coding sequences first recognized in vertebrates and 
amphioxus”, also present in S. kowalevskii and, partially, in P flava and 
A. planci. b, The four transcription factor genes of the cluster are expressed 
in the pharyngeal/foregut endoderm of the Saccoglossus juvenile: nkx2.1 

is expressed in a band of endoderm at the level of the forming gill pore, 
especially ventral and posterior to it (arrow), and in a separate ectodermal 
domain in the proboscis. It is also known as thyroid transcription factor 

1 due to its expression in the pharyngeal thyroid rudiment in vertebrates. 
The nkx2.2 gene is expressed in pharyngeal endoderm just ventral to 

the forming gill pore, shown in side view (arrow indicates gill pore) and 
ventral view; and pax1/9 is expressed in the gill pore rudiment itself. In 

S. kowalevskii, this is its only expression domain, whereas in vertebrates 

it is also expressed in axial mesoderm. The foxA gene is expressed widely 
in endoderm but is repressed at the site of gill pore formation (arrow). An 
external view of gill pores is shown; up to 100 bilateral pairs are present in 
adults, indicative of the large size of the pharynx. 
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Figure 5 | Examples of deuterostome gene novelties. a, Steps of 
biosynthesis of sialic acid and its addition to and removal from 
glycoproteins. b—d, Novel genes in TGF@ signalling pathways. The 
encoded proteins are shown and include Lefty (b), an antagonist of Nodal 
signalling, which activates Smad2/3-dependent transcription when not 
antagonized; Univin (c), an agonist of Nodal signalling, also called Vg1, 
DVRI, and GDF1; and TGF#2 (d), a ligand that activates Smad2/3- 
dependent transcription by binding to a deuterostome-specific TGF8 


apposed endoderm and ectoderm) were but one part, and the cluster is 
retained in these cases because of its continuing contribution to phar- 
ynx development. Genomic regions of the pharyngeal cluster have been 
implicated in long-range promoter-enhancer interactions, support- 
ing the regulatory importance of this gene linkage (see Supplementary 
Note 9)*°. Alternatively, genome rearrangement in these lineages may 
be too slow to disrupt the cluster even without functional constraint. 
Here we propose that the clustering of the four ordered transcription 
factors, and their bystander genes, on the deuterostome stem served a 
regulatory role in the evolution of the pharyngeal apparatus, the fore- 
most morphological innovation of deuterostomes. 


Deuterostome novelties 

We found >30 deuterostome genes with sequences that differ mark- 
edly from those of other metazoans, related to functional innovation 
in deuterostomes. Some plausibly arose from accelerated sequence 
change on the deuterostome stem from distant but identifiable bila- 
terian homologues, others represent new protein domain combina- 
tions in deuterostomes, while others lack identifiable sequence and 
domain homologues in other animals. In the latter group, we found 
over a dozen deuterostome genes that have readily identified relatives 
in marine microbes, often cyanobacteria or eukaryotic micro-algae, 
but are not known in other metazoans (Extended Data Table 1 and 
Extended Data Fig. 7; Supplementary Notes 10.4 and 10.5). Such genes 
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receptor type II, which contains a novel ectodomain (not shown). Also 
shown in d is the novel protein thrombospondin 1 that activates TGF32 
by releasing it from an inactive complex, by way of its TSP1 domains. Red 
boxes around protein names indicate their deuterostome novelty. Green 
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include two of the novel deuterostome sequences associated with sialic 
acid metabolism (found in many microbes“', see below), enzymes that 
modify proteins (for example, protein arginine deiminase) and RNA 
(for example, FATSO methyladenosine demethylase) as well as others 
that provide specialized reactions of secondary metabolism (Extended 
Data Table 1 and Extended Data Fig. 7; Supplementary Note 10.5). 
Possible explanations for the unusual phylogenetic distribution of these 
genes include horizontal transfer on the deuterostome stem from early 
marine microbes (which were plausibly commensals, pathogens, or 
food sources of stem deuterostomes), or convergent gene loss and/or 
extensive sequence divergence along five or more opisthokont lineages 
(Supplementary Note 10.2). 

Regardless of their mechanism of origination, the various deuteros- 
tome novelties and gene family expansions of sialic acid metabolism are 
noteworthy. Deuterostomes are unique among metazoans in their high 
level and diverse linkage of addition of sialic acid (also known as neu- 
raminic acid), a nine carbon negatively charged sugar, to the terminal 
sugars of glycoproteins, mucins and glycolipids””. We find expanded 
families of enzymes for several of these reactions in hemichordates 
(Fig. 5a and Extended Data Table 1). Based on the presence/absence 
of relevant enzymes we infer that 5 of the 11 steps of the pathways of 
sialic acid formation, addition to termini, and removal are not found 
in protostomes or other metazoans, and are deuterostome novelties 
(Fig. 5a and Supplementary Note 10), whereas the other steps use 
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enzymes similar to those of the more limited pathway of some protos- 
tomes (for example, insects such as Drosophila). 

The importance of glycoproteins for muco-ciliary feeding and other 
hemichordate activities is further supported by novel and expanded 
families of genes encoding the polypeptide backbones of glycopro- 
teins, those with von Willebrand type-D and/or cysteine-rich domains 
(PTHR11339 classifier), including mucins, present in hemichordates 
and amphioxus as large tandemly duplicated clusters (with varied 
expression patterns as shown in Extended Data Fig. 8), but not in sea 
urchin, which has a different mode of feeding (Supplementary Note 
10). As in amphioxus, the pharynx of Saccoglossus is heavily ciliated***, 
and cells of the pharyngeal walls in hemichordates and the ventral 
endostyle in amphioxus secrete abundant mucins and glycoproteins”®. 
Similarly, in the deuterostome ancestor these glycoproteins probably 
enhanced the muco-ciliary filter-feeding capture of food particles from 
the microbe-rich marine environment and protected its inner and outer 
tissue surfaces. 


Novelty in the TGF6 signalling pathway 

The signalling ligands Lefty (a Nodal antagonist) and Univin/Vg1/ 
GDF1” (a Nodal agonist) are deuterostome innovations that modulate 
Nodal signalling during the major developmental events of endomeso- 
derm induction and axial patterning in vertebrates, axial patterning 
in hemichordates and echinoderms, and left-right patterning in all 
deuterostomes*® (see Fig. 5b-d and Extended Data Fig. 9a, b). Univin 
is tightly linked to the related bilaterian bmp2/4 in the sea urchin 
genome” and also, we now report, in hemichordates and amphioxus, 
supporting its origin by tandem duplication and divergence from an 
ancestral bmp2/4-type gene, as suggested previously”. 

TGF@2 signalling (TGF@1, 2 and 3 in vertebrates) is a deuterostome 
innovation that controls cell growth, proliferation, differentiation 
and apoptosis at later developmental stages. Accompanying the novel 
TGFQ2 ligand, the type II receptor has a novel ectodomain. The extra- 
cellular matrix protein thrombospondin 1, which activates TGF82 in 
vertebrates, contains a deuterostome-unique combination of domains 
including three thrombospondin type 1 (TSP1) domains that bind 
the TGF32 pro-domain region. While these signalling novelties have 
clear sequence similarity to pan-bilaterian components, they form 
long stem branch clades on the phylogenetic trees, indicating exten- 
sive sequence divergence on the deuterostome stem (Supplementary 
Note 10). Together, these innovations appear to contribute to the 
increased amount and complex patterning of Smad2/3-mediated 
signalling in deuterostomes compared with protostomes and other 
metazoans. 


Conclusion 

The two acorn worms whose genomes are described here represent 
the two main enteropneust lineages, separated by at least 370 million 
years and differing in their developmental modes. These analyses 
reveal (1) extensive conserved macro-synteny among deuterostomes; 
(2) a widely conserved deuterostome-specific cluster of six ordered 
genes, including four transcription factor genes that are expressed 
during the development of pharyngeal gill slits and the branchial 
apparatus, the most prominent morphological innovation of the deu- 
terostome ancestor; and (3) numerous gene novelties shared among 
deuterostomes, many expanded into large families, with putative pro- 
tein functions that imply physiological, metabolic and developmental 
specializations of the filter-feeding deuterostome ancestor. Some of 
these genes lack identifiable orthologues in other metazoans but do 
resemble microbial sequences and domain types. In addition to their 
contributions towards defining the deuterostome ancestor and illu- 
minating chordate origins, the two genomes should inform hypoth- 
eses of larval evolution by providing a basis for future comparisons 
of direct-developing and indirect-developing acorn worms, which 
achieve remarkably similar adult forms by distinct embryological 
routes (Fig. 1). 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Sequencing. Sperm DNA from adult males was extracted for sequencing as 
described in Supplementary Note 2. A single male was used for each species to min- 
imize the impact of heterozygosity on assembly. For Saccoglossus, approximately 
eightfold redundant random shotgun coverage (totalling 8.1 Gb) was obtained with 
Sanger dideoxy sequencing at the Baylor College of Medicine Genome Center, 
including 34,279 BAC ends and 459,052 fosmid ends. For Ptychodera, 1.3 Gb in 
Sanger shotgun sequences, 15.3 Gb in Roche 454 pyrosequence reads, and 52-Gb 
paired-end sequences with Illumina MiSeq, along with mate-pairs, were generated 
at the Okinawa Institute of Science and Technology Graduate University. More 
sequencing details are available in Supplementary Note 2. 

Genome assemblies. We assembled the Saccoglossus genome with Arachne”, 
combined with BAC/fosmid pair information to produce the final assembly. 
This Saccoglossus assembly includes 7,282 total scaffold sequences spanning a 
total length of 758 Mb. The relatively modest nucleotide heterozygosity (0.5%) of 
S. kowalevskii, coupled with longer read lengths, enabled assembly of a single com- 
posite reference sequence. Half of the assembly is in scaffolds longer than 552 kb 
(the N50 scaffold length), and 82% of the assembled sequence is found in 1,602 
scaffolds longer than 100kb. For Ptychodera we used the Platanus®! assembler. 
The resulting total scaffold length was 1,229 Mb, with half the assembly in scaf- 
folds longer than 196 kb (N50 scaffold length). P flava exhibited a notably higher 
heterozygosity (1.3% single nucleotide heterozygosity with frequent indels) than 
S. kowalevskii, presumably related to its pelagic dispersal and larger effective pop- 
ulation size’. We therefore initially produced stringent separate assemblies of the 
two divergent haplotypes, and found that many scaffolds had a closely related 
second scaffold with ~94% BLASTN identity (over longer stretches, including 
indels). To avoid reporting both haplotypes at these loci, scaffolds with less than 
6% divergence over at least 75% of their length were merged into a single haploid 
reference for comparative analysis. To further classify regions with ‘double’ depth 
and single haplotype regions we implemented a Hidden Markov Model classi- 
fier. We find that at least 63% of the initial Platanus assembly constitutes merged 
haplotypes. The inferred SNP rate for those regions is 1.3%, while for the remaining 
haplotype regions it is below 0.1%. Further details of assemblies are described in 
Supplementary Note 2. 

Gene predictions. Transcriptome data for both species were used, along with 
homology-guided and ab initio methods, to predict protein-coding genes 
(Supplementary Note 3). For Saccoglossus, 8.6 million RNAseq reads were gener- 
ated from 7 adult tissues and 15 developmental stages using Roche 454 sequencing, 
along with previously deposited ESTs in GenBank. For Ptychodera, extensive EST 
data from egg, blastulae, gastrulae, larvae, juveniles, adult proboscis, stomochord, 
and gills defining 34,159 cDNA clones®, and 879,000 Roche/454 RNAseq reads 
from a mixed library of developmental stages™ were used. The Saccoglossus genome 
was annotated using JGI gene prediction pipeline®, while Augustus® was used to 
produce gene models for Ptychodera. We find a total of 34,239 gene predictions for 
Saccoglossus (68% with transcript evidence) and 34,687 for Ptychodera (43% with 
transcript evidence), although these are overestimates of the true gene number due 
to fragmented gene predictions, mis-annotated repetitive sequences, and spurious 
predictions. As described in the main text, 18-19,000 gene models in each species 
have known annotations and/or orthologues in other species. 

Gene family analysis. Gene family clustering was done using a progressive (leaf 
to root) BLASTP-based clustering algorithm, where at a given phylogenetic node 
the gene families are constructed taking into account protein similarities among 
ingroups and outgroups””. For the inference of deuterostome gene families we 
use the bilaterian node of the clustering. To call gene families present in the deu- 
terostome ancestor, we required (1) at least two ambulacrarian orthologues out of 
the three available ambulacrarian genomes and at least two chordate orthologues, 
or (2) at least two deuterostomes (chordates and/or ambulacrarians) and two 
outgroups in the bilaterian level clusters. 

Transposable elements. Repetitive sequences were identified using RepeatScou 
followed by manual curation and annotation using both a Repbase release (version 
20140131) and BLASTX-based search against a custom collection of transpos- 
ons, using a previously described repeat identification and annotation pipeline” 
(Supplementary Note 5). The assemblies were then masked with RepeatMasker 
version open-4.0.5°°. The repetitive complements of the two hemichordate 
genomes are summarized in Supplementary Table 5.1. 

Phylogenetic analysis. Phylogenetic analyses were done using metazoan-level 
gene family clusters based on whole-genome sequences (Supplementary Note 4), 
selecting a single orthologue per genome with the best cumulative BLASTP to 
other species, and best reciprocal BLASTP hits to species with transcriptome-only 


18, 


information (Supplementary Note 6). Single gene alignments were built using 
Muscle®! and filtered using Trimal® for each orthologue, and were concatenated, 
yielding a supermatrix of 506,428 positions with 34.9% missing data. This super- 
matrix was analysed with ExaML assuming a site-homogenous LG+I°4 model 
partitioned for each gene®. A slow-fast analysis was conducted to stratify marker 
genes based on the length of the branch leading to acoels in individual trees. 
A subset of the slowest 10% of genes was analysed with the site-heterogenous 
CAT+GTR+I4 model using Phylobayes”*. Molecular dating was carried out 
using Phylobayes”* using the log-normal relaxed clock model and the calibrations 
described in Supplementary Table 6.2. 

Synteny analysis. Macro- and micro-syntenic linkages were calculated as described 
in Supplementary Note 7. For Fig. 3a, we merged the amphioxus scaffolds into 17 
pre-defined scaffold groups as suggested in ref. 27. These 17 merged scaffold groups 
represent the 17 ancestral linkage groups (ALGs) shared in chordates. Then we cal- 
culated the orthologous gene groups shared by each amphioxus ALG-Saccoglossus 
scaffold pair and generated the dot plot as described in Supplementary Note 7. For 
micro-synteny we required at least three genes (separated by a maximum of ten 
genes) to be present in pairwise comparisons. Under random reshuffling of the 
genome, this yields 10% false positives in pairwise genome comparisons, that is, 
we observe approximately one-tenth as many micro-syntenic blocks between the 
two genomes when gene orders are shuffled. This false-positive rate, however, falls 
to 1% when considering more than two species. For our inference of deuterostome 
ancestral and novel synteny we therefore focus on blocks present in at least three 
species (and both ingroup representatives, that is, ambulacrarians and chordates). 
This yields 698 blocks that can be traced back to the deuterostome ancestor, includ- 
ing 71 blocks found exclusively in deuterostome species (shared among ambu- 
lacrarians and chordates), including the pharyngeal cluster discussed in Fig. 4. 
Whole-genome alignment. Whole-genome alignments were conducted with 
MEGABLAST® using parameters previously reported®. We assessed the dis- 
tribution of the resulting 12,722 aligned loci across known gene annotations in 
ENSEMBL*, previously identified conserved pan-vertebrate elements®, as well 
as known enhancers in human according to LBL database”. 

Gene novelties. Deuterostome gene novelties were assessed initially through bila- 
terian gene clusters (Supplementary Note 10) by requiring at least two species on 
both ambulacrarian and chordate side to be present. The novelties were further 
automatically subdivided into four categories: G1 (gain type I), with no BLASTP 
hit outside of deuterostomes; G2 (gain type II), with a novel PFAM domain pres- 
ent only in deuterostomes; G3 (gain type III) having a novel PFAM combination 
unique to deuterostomes; and G4 (gain type IV), those that do not fall under any 
of the G1-3 categories and define novelties due to acceleration in the substitution 
rate on the deuterostome stem. To confirm the novel nature, especially for G4 
novelties, we have constructed phylogenies for the members and non-deuterostome 
BLASTP hits (up to an e-value of 1 x 10~?°) using MAFFT-alignment-based 
FastTree calculations. The trees were assessed for the accelerated rate of evolution 
at the deuterostome stem (Supplementary Fig. 9.1.1). The final result is provided 
in the Supplementary Information. 

Curation of candidates for horizontal gene transfer on the deuterostome stem. 
We examined in detail gene families found broadly in deuterostomes whose 
encoded peptides were readily alignable to microbial sequences but had no detect- 
able similarity in non-deuterostome animals. Criteria for evaluation included: 
(1) the hemichordate gene matches microbial genes at least ten orders of magnitude 
in the e-value better than it matches sequences of non-deuterostome metazoans 
(most of the putative HGTs we describe have no non-deuterostome metazoan hit 
at all); (2) it has a defined genomic locus among bona fide metazoan genes; (3) it 
shares an exon-intron structure with genes of chordates and other ambulacraria; 
and (4) when a low bitscore match is found to a non-deuterostome metazoan 
sequence, that sequence is identified as containing different domains (domain 
structure according to CDD®) and/or different exon-intron structure, implying 
dubious relatedness. When phylogenetic trees are constructed for these HGT- 
candidate proteins, the trees contain numerous branches for microbial sequences 
and none for non-deuterostome metazoan sequences, or only very long branches 
for dubiously relatives, and hence the trees differ greatly from the metazoan species 
tree, except within the deuterostome clade. 

Code availability. Original data and code can be accessed at https://groups.oist. 
jp/molgenu. 
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Extended Data Figure 1 | Summary of genome assemblies and 
heterozygosity distributions for Saccoglossus and Ptychodera. 

a, Genome statistics summary. b, c, The single nucleotide polymorphism 
distribution across 100-bp windows for Saccoglossus (b) and the 
corresponding distribution for Ptychodera (c). The distributions in b 

and c are fitted with a geometric (expected when high recombination rate 
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is present) and a Poisson distribution (expected with low recombination 
rate). The distribution for Saccoglossus is fitted to windows with 

one or more SNPs only, as there is an excess of zero SNP windows 
(approximately 84% of total 94,324 selected windows). For methods refer 
to Supplementary Note 2. 


© 2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Spo ge eo etter 
£2 pene sen rennin 
etl REPEAT SR BERATE® orem 


0.010 
1 

=| 

Z 


/TEMT FAMILY MEMBER 


roUUU" 
A 


0.005 
L 


PC2 (7%) 
5 
Fa 
° 
8 


5 
| 
OP RR TCA SE 


3 = : AN SFERASE 
3 cit A 5 ‘TED 
Sait es 
i 5s G Nl a: LISIN/KEXIN TYPE 9-RELATED 
ipa Teg a 1605 HEPARAN SULFATE D-GLUCOSAMINYL 3-O-SULFOTRANSFERASE-RELATED 
oS 2 1 % 
T T T T T Row Z-Score 
-0.006 -0.004 -0.002 0.000 0.002 
PCA (12%) 
Extended Data Figure 2 | Ambulacrarians approximate the ancestral right corner, also with the lophotrochozoans Cgi, Lgi, Hro, Cte and the 
metazoan gene repertoire. a, Principal component analysis of non-bilaterians Hma, Nve and Adi. b, Heat map of gene family counts 
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Extended Data Figure 3 | Molecular dating of deuterostome and 
metazoan radiations using PhyloBayes assuming a log-normal relaxed 
clock model. Yellow circles on particular nodes indicate the calibration 
dates applied from the fossil record, as indicated in Supplementary Note 
6.2. Bars are 95% credibility intervals derived from posterior distributions. 
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Scaphechinus mirabilis 
Parastichopus parvimensis 
Patiria miniata 
Amphipholis sp. 
Florometra serratissima 


} Tas : Ptychodera flava 
-——epe Balanoglossus clavigerus 


Schizocardium californicum 
Saccoglossus kowalevskii 
Rhabdopleura compacta 
Praesagittifera naikaiensis 
Hofstenia miamia 

Porites astreoides 
Acropora digitifera 
Montastraea faveolata 
Nematostella vectensis 
Hydra magnipapillata 
Aurelia aurita 

Trichoplax adherens 

- Ephidatia muleleri 
Amphimedon queenslandica 
Oscarella carmella 


Note the estimated times of divergence of chordates and ambulacraria 
(the deuterostome ancestor) at 570 million years ago (Ma; mid- 


Ediacaran), hemichordates and echinoderms at 559 Ma, enteropneusts and 
pterobranchs at 547 Ma, and Harrimaniid and Ptychoderid enteropneusts 


at 373 Ma. 
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Extended Data Figure 4 | Homeobox gene complement of the two 
hemichordates in comparison to that of amphioxus. The numbers of 
homeobox-containing gene models are 170 in Saccoglossus and 139 in 
Ptychodera. These homeobox domains were aligned with 128 homeobox 


genes of Branchiostoma floridae using ClustalW2, then gaps and unaligned 


regions were manually removed. Since some genes have more than 

one homeobox domain, we kept all domains or chose the longest one 
according to the state of domain conservation. In total, 448 homeobox 
sequences were aligned. See Supplementary Information for details. 

The clusters of homeobox genes on scaffolds in Saccoglossus and 
Ptychodera were identified and drawn at positions around the tree. 
Conserved clusters between the two species were aligned. In addition 

to the well-known Hox and ParaHox cluster, 17 clusters were found in at 
least one of the hemichordates or some in both. Sixteen genes of the Nkx 
class are distributed over four clusters: (i) nkxla-vent1-vent2. 1-vent2.2; 
(ii) nkx2.1-nkx2.2-msxlx; (iii) nkx5-msx-nkx3.2-nkx4-lbx-hex; and 
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(iv) voxvent-nk7like-nk7like2. The second cluster (ii) of these is part of the 


pharyngeal cluster (Fig. 4). Another five-gene cluster consists of one Lim 
class homeobox gene and four PRD class homeobox genes; isl-otp-rax- 
arx-gsc. A cluster of six3/6-six 1/2-six4/5 was found in both species, 

and a cluster of three unx genes was found only in P. flava. Ten more 
clusters were found containing two homeobox genes each. Notably, 

we found species-specific homeobox clusters in both species. Three 
remarkable clusters were found in S. kowalevskii in which 10, 12 and 

5 homeobox-containing genes are tandem duplicated in scaffold_1710, 52 
and _4796, respectively. We also found such clusters in Pflava in which 
7, 4, 8 and 10 genes are aligned on scaffold 19451, scaffold 1398, scaffold 
12422 and scaffold 154657, respectively. All homeobox genes identified 
in the genomes of the two hemichordates and amphioxus are listed in the 
Supplementary Table for Extended Data Fig. 4. This list includes some 
genes not containing a homeobox (for example, pax1/9) in cases where 
other family members do (for example, pax2). 


© 2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Amphioxus 


Real 


Saccoglossus 
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Extended Data Figure 5 | High retention rate of micro-synteny in 
Saccoglossus. Circos plot showing micro-syntenic conservation in blocks 
of genes (Mmax = 10 and nin = 2) for six metazoan species for observed 
(left) and simulated (right) linkages. The width of connecting segments is 
proportional to the number of genes participating in the syntenic linkages 
(normalized by the total gene count). In this representation scaffolds are 
placed end-to-end, and adjacent scaffolds need not be from the same 
chromosome. While simulated data yields some blocks shared between 
pairs of species, few or no synteny blocks can be recovered among three 
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(All species 
(I 5 species 
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3 species 


[EEE 2 species 


Random 
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Saccoglossus 


or more species (Methods). Saccoglossus shows one of the highest 
retentions among the selected species (and the highest among the 
sequenced ambulacrarians). Xenopus (and vertebrates in general) have lost 
some micro-synteny due to whole-genome duplications and differential 
loss of paralogues. The matching between the hemichordate S. kowalevskii 
and the chordate amphioxus is highest, consistent with the fact that neither 
genome has undergone extensive gene loss (as have tunicates) or pseudo- 
tetraploidization with extensive loss of paralogues (as have vertebrates). 
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Extended Data Figure 6 | Deuterostome specific micro-syntenic c-e, Loose micro-syntenic linkages with a maximum of five intervening 
linkages. a, b, Very tight linkages with no intervening genes. a, ParaHox genes: lefty (c), six1-six4 (d), and fgf8-fbxw (e)® clusters. For c to e all 
cluster shown in S. kowalevskii, P. flava, and human. b, bmp2/4 and species with micro-synteny are shown. Numbers above the genes indicate 
univin cluster in the hemichordates S. kowalevskii and P. flava, the the copy number in the locus. 


sea urchin S. purpuratus, and the cephalochordate B. floridae. 
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Extended Data Figure 7 | Three examples showing the domain structures 
of some proteins encoded by genes found in deuterostomes and marine 
microbes but not non-deuterostome animals. Best BLASTP hits of the 
Saccoglossus sequence in human/mouse, as well as in non-deuterostome 
metazoans and in non-metazoans (such as the cyanobacterium Staniera 
cyanosphaera, or the eukaryotic micro-alga Ostreococcus tauri) are shown. 
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Supplementary Note 10. 


a, Cytidine monophosphate-N-acetylneuraminic acid hydroxylase (CMAH), 
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an enzyme of sialic acid modification; b, peptidyl arginyl deiminase (PAD), 
an enzyme of post-translational modification of proteins; c, FATSO-like, 
also called «-ketoglutarate-dependent dioxygenase FTO, an enzyme that 
de-methylates N°-methyladenosine in nuclear RNA. Other analyses of these 
and other genes with the unusual phylogenetic distributions can be found in 


ARTICLE 


a 
b Blastula Early gastrula Late gastrula Tornaria Juvenile 
VWD 1 
VWD 3 | 
vwo 4 : i 
Extended Data Figure 8 | In situ hybridization demonstration of the subregions of the ectoderm of the proboscis or collar at these pre-feeding 
expression of von Willebrand type D (vWD) domain-encoding genes stages. b, In Ptychodera, several of the genes are expressed in endoderm 
(putative glycoproteins/mucins) in Saccoglossus and Ptychodera. as well as ectoderm of the developing tornaria larva. The sequence IDs for 
a, In Saccoglossus the genes are specifically expressed in different the genes are provided in Supplementary Note $10.4. 
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Extended Data Figure 9 | Gene innovation in deuterostomes. a, FastTree 
phylogenetic tree of the TGF3 family members Lefty, TGF32, GDF8/11 
and Nodal ligands (using GTR model). Bootstrap support is plotted as 
filled circles (size proportional to the support value) on each node. While 


Lefty shows deuterostome unique sequence composition, TGF32 has 
an acceleration of sequence change at the deuterostome stem branch, 


compared to the GDF8/11 or Nodal groups. b, Temporal co-expression 


of Lefty and TGF® receptor type II in Saccoglossus at pre-gastrulation 
developmental stages and of TGF32 and TGF@ receptor type I at 


post-gastrulation stages. c, In situ hybridization demonstration of the 
expression in S. kowalevskii of one of the putative type I novelty genes 
(c9orf9, also known as rsb66) and of two of AAADC genes (aromatic 
amino acid decarboxylases of the microbial type) of S. kowalevskii (also in 
P flava and B. floridae), which closely resemble sequences from bacteria 
rather than from non-deuterostome metazoans. gs, gill slits. d, The 
temporal expression profile for c9orf9 during S. kowalevskii development, 
taken from transcriptome data. 
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Examples of deuterostome gene novelties and their genomic features 
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A perisinusoidal niche for extramedullary 
haematopoiesis in the spleen 


Christopher N. Inra!*, Bo O. Zhou!*, Melih Acar!, Malea M. Murphy!, James Richardson, Zhiyu Zhao! & Sean J. Morrison! 


Haematopoietic stresses mobilize haematopoietic stem cells (HSCs) from the bone marrow to the spleen and induce 
extramedullary haematopoiesis (EMH). However, the cellular nature of the EMH niche is unknown. Here we assessed 
the sources of the key niche factors, SCF (also known as KITL) and CXCL12, in the mouse spleen after EMH induction by 
myeloablation, blood loss, or pregnancy. In each case, Scf was expressed by endothelial cells and Tcf21* stromal cells, 
primarily around sinusoids in the red pulp, while Cxcll2 was expressed by a subset of Tcf21* stromal cells. EMH induction 
markedly expanded the Scf-expressing endothelial cells and stromal cells by inducing proliferation. Most splenic HSCs 
were adjacent to Tcf21* stromal cells in red pulp. Conditional deletion of Scf from spleen endothelial cells, or of Scf or 
Cxcll2 from Tcf21* stromal cells, severely reduced spleen EMH and reduced blood cell counts without affecting bone 
marrow haematopoiesis. Endothelial cells and Tcf21* stromal cells thus create a perisinusoidal EMH niche in the spleen, 
which is necessary for the physiological response to diverse haematopoietic stresses. 


The haematopoietic system employs facultative niches that arise in 
response to injury. Adult haematopoiesis occurs primarily in the bone 
marrow of mammals. However, a wide range of haematopoietic stresses 
including myelofibrosis’, anaemia”’, pregnancy*”, infection®’, myelo- 
ablation® and myocardial infarction? can induce EMH, in which HSCs 
are mobilized to sites outside the bone marrow to expand haemato- 
poiesis. The splenic red pulp is a prominent site of EMH in mice and 
humans’”-?. During EMH, HSCs are found mainly around sinusoids in 
the red pulp, raising the possibility of a perisinusoidal niche!*, CXCL12 
is expressed by sinusoidal endothelial cells in the red pulp of the human 
spleen’* and macrophage ablation reduces splenic erythropoiesis after 
irradiation!®. However, little else is known about the EMH niche. 


Niche factor expression in the spleen 

HSCs are rare in normal adult spleen’” but myeloablation with cyclo- 
phosphamide followed by daily administration of granulocyte colony- 
factor (G-CSF) induces HSC mobilization from the bone marrow to 
the spleen and induction of EMH®. Cyclophosphamide plus 21 days of 
G-CSF (Cy+21 d G-CSF) increased erythropoiesis and myelopoiesis in 
the red pulp, profoundly increasing spleen size, spleen cellularity, HSC 
number and progenitor numbers relative to control spleens (Extended 
Data Fig. 1c, f-m). 

In normal adult spleens from Scf GFP. Cx] 1 2PRe4 mice!®!9, and after 
EMH induction, Scf-green fluorescent protein (GFP) and Cxcl12- 
DsRed were primarily expressed throughout the red pulp (Fig. 1a, b and 
Extended Data Fig. la—e). Red pulp endothelial cells and perivascular 
stromal cells expressed high levels of Scf-GFP, irrespective of EMH 
induction (Fig. la-c and Extended Data Fig. 1d, e). In white pulp, Scf- 
GFP was expressed by many fewer stromal cells and central arteriolar 
endothelial cells (Fig. 1b and Extended Data Fig. le). Cxcl12-DsRed was 
not expressed by endothelial cells but was expressed by a subset of Scf- 
GFP* perivascular stromal cells, primarily around red pulp sinusoids 
and to a lesser extent around white pulp central arterioles (Fig. la-c 
and Extended Data Fig. 1d, e). 

Scf-GFP* cells were 0.48 + 0.10% of enzymatically dissoci- 
ated adult spleen cells (Fig. 1d) and Cxcl12-DsRed* cells were 
0.031 + 0.011% (Fig. 1f). Most Scf-GFP* cells (75 +5.8%) were 


VE-cadherint CD45 Ter119~ endothelial cells (Fig. 1d): 85 + 8.2% 
of all VE-cadherint CD45" Ter119~ spleen endothelial cells were Scf- 
GFP* and none expressed Cxcl12-DsRed (Fig. le). Non-endothelial 
Scf-GFP* cells were virtually all PDGFR-B CD45 Ter119~ stro- 
mal cells (Fig. 1d). Some Scf-GFP* stromal cells (22 + 3.8%) also 
expressed Cxcl12-DsRed (Fig. 1d). Virtually all Cxcl12-DsRed* stro- 
mal cells expressed Scf-GFP (Fig. 1f). Therefore, Scf was expressed by 
VE-cadherin® endothelial cells and PDGFR-(* stromal cells while 
Cxcl12 was expressed by a minority of Scf-expressing stromal cells in 
adult spleen. 

EMH induction did not appear to alter spleen Scf-GFP or Cxcl12- 
DsRed expression (Fig. la versus Extended Data Fig. 1d). Flow cyto- 
metric analysis showed no change in the fluorescence intensity of 
individual Scf-GFP* or Cxcl12-DsRed* spleen cells after EMH induc- 
tion (Extended Data Fig. 10, p). However, the frequencies and absolute 
numbers of Scf-GFP* and Cxcl12-DsRed* cells increased significantly 
upon EMH induction (Fig. 1g-j and Extended Data Fig. 1q, r). These 
cells rarely divided in normal adult spleen but proliferated upon EMH 
induction (Fig. 1j, k). 

LepR* stromal cells are the main sources of SCF and CXCL12 for 
HSC maintenance in the bone marrow!*~”. In the spleens of Lepr‘; 
R26'47mato mice, recombination occurred mainly in the white pulp, 
where HSCs are not observed"! (Extended Data Fig. 1s). Only about 
20% of Scf-GFP* stromal cells expressed LepR (Extended Data Fig. 1t). 
LepR* cells were PDGFR-8*VE-cadherin~ stromal cells that accounted 
for 37 + 13% of colony-forming-unit fibroblasts (CFU-Fs) formed by 
enzymatically dissociated spleen cells (Extended Data Fig. lu, v). 

Consistent with our prior study’’, Lepr’; Scf”~ mice had signifi- 
cantly fewer CD150*CD48"LSK HSCs in the bone marrow and sig- 
nificantly increased spleen cellularity relative to Scf */~ and Sef +/+ 
controls (Extended Data Fig. lw, x). Upon EMH induction by cyclo- 
phosphamide plus 4 days of G-CSF (Cy+4 d G-CSF), Lepr”; Scf!”— 
mice exhibited significant declines in spleen cellularity and spleen HSC 
number relative to controls (Extended Data Fig. 1x, y). While LepR* 
perivascular stromal cells could contribute to the EMH niche in adult 
spleen, the impaired EMH in these mice may also reflect bone marrow 
HSC depletion before EMH induction (Extended Data Fig. lw). 
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Figure 1 | Endothelial cells and perivascular stromal cells in the red 
pulp express Scf and Cxcl12 and proliferate upon induction of EMH. 

a, b, Scf-GFP and Cxcl12-DsRed were mainly expressed by stromal cells in 
the red pulp of normal spleens. b, High-magnification view of the boxed 
area in a. Dashed lines depict the boundary between white pulp (WP) and 
red pulp (RP). Arrow indicates central arteriole in the white pulp, around 
which rare stromal cells expressed Cxcl12-DsRed. c, Splenic red pulp from 
Sef?F P- Cxcl1 2°84 mice had VE-cadherin* endothelial cells (arrows) that 
expressed Scf-GFP and VE-cadherin” stromal cells (arrowheads) that 
expressed Scf-GFP and sometimes Cxcl12-DsRed. VE-Cad, VE-cadherin. 
d-f, Flow cytometric analysis of enzymatically dissociated spleen cells 
from Sef"; Cxcl12>°8*4 mice. Scf-GFP was expressed by VE-cadherin 
endothelial cells and PDGFR-B ‘VE-cadherin stromal cells, a subset of 
which also expressed Cxcl12-DsRed (d). Most VE-cadherin® endothelial 
cells were positive for Scf-GFP but negative for Cxcl12-DsRed (e). Most 
Cxcl12-DsRed* cells were positive for Scf-GFP (f). Data in a-f represent 
mean +s.d. from 3 mice from 3 independent experiments. g-I, The 
frequencies and absolute numbers of Scf-GFP* cells (g, i) and Cxcl12- 
DsRed* cells (h, j) significantly increased upon induction of EMH by 
Cy+21 d G-CSF (+EMH). k, 1, 5-Bromo-2!-deoxyuridine (BrdU) was co- 
administered to Scf°!” (k) or Cxcl12PsRe4 mice (1) along with G-CSF for 7 
days after cyclophosphamide treatment. Data represent mean + s.d. from 3 
independent experiments. The numbers of mice per treatment are shown 
on the bars in panels. g-I, Two-tailed Student’s t-tests were used to assess 
statistical significance (**P< 0.01, ***P<0.001). 
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Tcf21* perisinusoidal stromal cells express Scf 
To identify cre alleles that recombine in spleen, but not bone marrow, 
stromal cells, we assessed the gene expression profile of spleen Scf- 
GFP*VE-cadherin™ stromal cells (Extended Data Table 1). After testing 
a number of cre alleles (see Extended Data Fig. 2), we found that Tcf21- 
Cre/ER (ref. 21) recombined efficiently in spleen Scf-GFP* stromal cells 
(Fig. 2a) but not in bone marrow (Fig. 2b, c). Tef21°" ER, R264 Tomato 
mice gavaged with tamoxifen for 12 days at 4-6 weeks of age expressed 
Tomato in Scf-GFP* stromal cells throughout the red pulp (Fig. 2a, d), 
whereas Tomato was expressed only in rare white pulp cells (Fig. 2a) 
and in no endothelial cells (Fig. 2d, e). Tomato™CD45~Terl19~ stro- 
mal cells from enzymatically dissociated Tof21°"”#®; R26" spleens 
accounted for 0.085 + 0.045% of spleen cells and 69 + 2% of spleen 
CFU-Fs (Fig. 2f, g). These cells were PDGFR-8* and LepR™ (Fig. 2f). 

In the liver, Scf-GFP was exclusively expressed by VE-cadherint 
endothelial cells (Extended Data Fig. 2a, b). Tef21-Cre/ER recombined 
in 0.09% of liver cells, none of which expressed Scf-GFP (Extended 
Data Fig. 2a, c). The Tcf21-Cre/ER recombination pattern did not sig- 
nificantly change in the spleen (Fig. 2f and Extended Data Fig. 2d, e), 
bone marrow (Extended Data Fig. 2f, g), or liver (Extended Data 
Fig. 2h, i) upon EMH induction by Cy+21 d G-CSF 

c-Kitt haematopoietic progenitors were almost exclusively within 
the red pulp in the normal spleen (Extended Data Fig. 3a, b) and after 
EMH induction (Fig. 2k). To assess HSC localization we used a new 
technique that permits deep imaging of «-catulin-GFP* c-Kitt HSCs in 
optically cleared haematopoietic tissues”. In the spleens of mice treated 
with Cy+4 d G-CSF, only 0.019 + 0.01% of splenocytes were a-catulin- 
GFP*c-Kit* (Fig. 2h). All long-term multilineage reconstituting cells 
in the spleen were a-catulin-GFP* and 28% of a-catulin-GFP* c-Kit* 
spleen cells gave long-term multilineage reconstitution in primary 
(Fig. 2i) and secondary irradiated recipient mice (data not shown). 

After antibody staining of a large segment of Tcf21°/#®; R26! Tomato, 
a-catulin@? spleen, we cleared the tissue (Extended Data Fig. 3c, d), 
then imaged to a depth of 300 1m and digitally reconstructed the tissue 
(Extended Data Fig. 3e, f and Supplementary Video 1). a-Catulin- 
GFP*c-Kit* HSCs were found exclusively within the red pulp, where 
80% were within 51m of Tomato* stromal cells (Fig. 2)). 


EMH requires SCF and CXCL12 from Tcf21* cells 

To test whether Tcf21°"’"8-expressing perivascular cells promote 
EMH, we treated 4—6-week-old Tef210" ER, Sef and littermate con- 
trol mice with tamoxifen for 12 days. A month later, bone marrow 
and spleen cellularity, blood cell counts, and bone marrow haemato- 
poiesis were similar in Tef21°/#®; Scf"' mice and littermate controls 
(Fig. 3a-f and Extended Data Fig. 3g-1). Then we treated Tcf217°/?*; 
Sof!" mice and littermate controls with cyclophosphamide followed 
by 4, 8, or 21 days of G-CSF. Tef21°"/#8; Scf" mice did not differ 
from controls with respect to bone marrow cellularity (Fig. 3a) or the 
numbers of HSCs (Fig. 3b), common myeloid progenitors (CMPs?3), 
granulocyte-macrophage progenitors (GMPs”*), or megakaryocyte— 
erythroid progenitors (MEPs’*) in the bone marrow after Cy+4-21 
d G-CSF treatment (Extended Data Fig. 3j-l). In contrast, Tcf21°?/?%; 
Scf!!" mice had significantly fewer splenocytes (Fig. 3c), spleen HSCs 
(Fig. 3d), CMPs (Fig. 3e), GMPs (Extended Data Fig. 3m) and MEPs 
(Fig. 3f) relative to littermate controls after Cy+8-21 d G-CSF treat- 
ment. We did not detect any difference between Tef2 1°"; Sof" mice 
and littermate controls in terms of vascular or stromal cell morphol- 
ogy in the spleen, with or without induction of EMH (Extended Data 
Fig. 4a-g). Conditional deletion of Scf with Tcf21-Cre/ER thus depletes 
HSCs and reduces EMH in the spleen without affecting bone marrow 
haematopoiesis. 

Red blood cell (RBC) and white blood cell (WBC) counts were sig- 
nificantly lower in Tef21°°/=®; Scf™" mice as compared to controls after 
Cy+8-21 d G-CSF treatment (Extended Data Fig. 3g-i). Splenectomy 
significantly reduced RBC and WBC counts in mice treated with 
Cy+G-CSE, demonstrating that splenic EMH is necessary for the 
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Figure 2 | During EMH most HSCs localize adjacent to Tcf21* stromal 
cells in the red pulp. a, Tamoxifen-treated adult Tcf217°/"®; R26!4 Tomato 
mice exhibited widespread Tomato expression by perivascular stromal 
cells in the red pulp (RP). b, c, No Tomato expression in bone marrow 
from tamoxifen-treated Tcf21°/"®; R26'¢7™!0 mice. d, e, Most 
Scf-GFP*VE-cadherin™ stromal cells were Tomato* (arrows) whereas 
Scf-GFP*VE-cadherin* endothelial cells were Tomato~ (arrowhead). 

f, Tomato*CD45~Ter119~ stromal cells from enzymatically dissociated 
spleen from Tof21°”28; R26"42"4!0 mice were positive for PDGFR-B 

but negative for LepR, irrespective of EMH induction by Cy+G-CSF. 

g, Percentage of all CFU-F colonies formed by enzymatically dissociated 
Tof21°/ER; R26'41™A10 spleen cells that were Tomato*. Macrophage 
colonies were excluded by staining with anti-CD45 antibody. h, a-Catulin- 
GFP*c-Kitt HSCs represented 0.019 + 0.01% of dissociated spleen cells in 
a-catulin®’? mice with EMH. i, a-Catulin-GFP* c-Kitt splenocytes were 
highly enriched for long-term multilineage reconstituting (LTMR) HSCs. 
j, k, Deep imaging of a-catulin-GFP* c-Kitt HSCs (arrows in k) in optically 
cleared spleen from a Tef21°’"8; R26'4!omato. «-catulin?? mouse with EMH 
induced by Cy+21d G-CSF. The distance from a-catulin-GFP* c-Kit* 
HSCs or random spots to Tomato* stromal cells (j; *P < 0.05 by two-tailed 
Student's t-test). a-Catulin-GFP* c-Kit HSCs were exclusively in the red 
pulp (k; see Extended Data Fig. 3f for a low-magnification view). All data 
reflect mean + s.d. from 3 mice in 3 independent experiments. 


recovery of blood cell counts (Fig. 3g, h and Extended Data Fig. 3n). 
However, conditional deletion of Scfby Tcf21-Cre/ER did not further 
reduce blood cell counts in splenectomized mice (Fig. 3g, h). SCF 
expression by Tcf21* stromal cells in the spleen is thus necessary for 
the regeneration of blood cells after Cy+G-CSF treatment. 

Bone marrow cellularity and bone marrow haematopoiesis were 
similar in Tcf21 ere/ER, Cx¢]12/”— mice and littermate controls, before 
and after Cy+-G-CSF treatment (Fig. 3i, j and Extended Data Fig. 3r-t). 
However, Tef21/#8; Cxcl12/”~ mice exhibited significantly reduced 
spleen cellularity (Fig. 3k) and numbers of spleen CMPs, GMPs and 


468 | NATURE | VOL 527 | 26 NOVEMBER 2015 


a BM cellularity b BM HSCs c SP cellularity 
= 80 Scfti/i & 6. o- 
2 Tof21cre/ER- Scfil/l =} = 8 
x 3 x x 
=f = 4! 5 6 
ne} a pe} 
eo EG ee ie 
2 20 2 22 
3 A a eee 3 5 
Oo 0 oO 0 — 00 
NT 4 8 21 NT 4 8 21 NT 4 8 21 
d e f 
~ 3,SPHSCs a SP CMPs _ . SPMEPs 
& °)33 33 85 77 615733 33 55 77 & 3133 33 55 77 
6 
z *» S104 ae 
oO 4 oO oO kk 
ne} Q 2 oer 
5% Eos » ppe & 1] 
sche i 5 ola 
& ole & o1—tide & oli — 
NT 4 8 21 NT 4 8 21 NT 4 8 21 
g h 
Sham operation: S 80 S15 tt 
m Sef" % 60 at & ~ _it 
WT cf21°°°/ER: Soffilfl = tt 310 = anes 
: 5 40 oo 5 oie 
Splenectomized: foe ke g 
See 
= ll V/ER. f/f 3 = 8 * 
cre/ER. = g 
Wi Tcf21°°°/ER: Sof a. a | 
NT 24 NT 24 
i BM cellularity J BMHSCs K  sP cellularity 
° 60 Control ° 8 ak rc) 8 
x Tef21ere/ER; Cxci12i- x & x6 
2 a4 44 33 B4 ia 
E E E * 
= z22 22 
3 80 8 o LER) Ea Eid int 
NT 4 8 21 NT 4. 8 21 
L SP CMPs Ns sP MEPs ° 
S10 o4 3300 
= iS 8 # 
x8 x3 2 
36 33 44 33 10105 200 
2 we 2 sad = 
54 5, Boo 
0 <i BB 3122 es 
00 © 0 Lala Ee cr 0 
NT 4 8 21 NT 4 8 21 NT 4 8 21 24 


Figure 3 | Tcf21-expressing stromal cells are an important source of 
SCF and CXCL12 for EMH in the spleen. a-f, Tef217” ER, Soff and 
Scf!“"" control mice were treated with tamoxifen then examined 1 month 
later either under normal conditions (not treated (NT)) or after treatment 
with Cy+4-21 d G-CSF to induce EMH. The number of bone marrow 
(BM) cells (a) and bone marrow CD150*CD48~LSK HSCs (b) in one 
femur plus one tibia as well as spleen (SP) cellularity (c) and the numbers 
of HSCs (d), CMPs (e) and MEPs (f) in the spleen. g, h, Sham-operated 
and splenectomized mice were treated with Cy+21d G-CSF 1 month 
after surgery: WBC (g) and RBC (h) counts are shown. i-o, Tef21°"*; 
Cxcl12/”~ and Cxcl12*/~ or Cxcl12/”— control mice were treated with 
tamoxifen then examined 1 month later either under normal conditions 
(NT) or after treatment with Cy+4-21d G-CSF to induce EMH. The 
number of bone marrow cells (i) and bone marrow HSCs (j) in one femur 
plus one tibia as well as spleen cellularity (k), numbers of HSCs (1), CMPs 
(m) and MEPs (n) in the spleen are shown. 0, Number of HSCs per ml of 
blood in tamoxifen-treated control and Tef21°7% ER. Cxcl12/”~ mice after 
Cy+21 d G-CSE. The numbers of mice per treatment are shown in each 
bar in each panel. All panels reflect mean + s.d. from three independent 
experiments. *P < 0.05, **P< 0.01, ***P< 0.001, statistical significance 
relative to sham-operated Scf!" mice. P< 0.05, ttP < 0.01, statistical 
significance among other treatments. 


MEPs (Fig. 3m, n and Extended Data Fig. 3u) relative to controls after 
Cy+8-21 d G-CSF treatment. Although the number of HSCs in the 
spleens of Tef21’"®; Cxcl12/”~ mice did not significantly differ from 
littermate controls (Fig. 31), HSC numbers were significantly elevated 
in the blood (Fig. 30) and in the bone marrow (Fig. 3j) of Tef21°/?%; 
Cxcl12/”~ mice after Cy+21 d G-CSF treatment. This suggests that 
some HSCs were mobilized from the spleens of Tcf21 ere/ER. Cyc] 2 — 
mice. Tef21°"”®8; Cxcl12/”~ mice also had significantly reduced RBC 
counts after Cy+21 d G-CSF treatment (Extended Data Fig. 30-q). 
We did not detect any difference between Tcf21°’=®; Cxcl12/”— 
mice and littermate controls in terms of the frequency or morphol- 
ogy of vascular or stromal cells in the spleen, with or without EMH 
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(Extended Data Fig. 4h-n). Tcf21-Cre/ER-expressing stromal cells are 
thus an important source of CXCL12 for spleen EMH but not bone 
marrow haematopoiesis. 


EMH requires SCF from endothelial cells 

We discovered that Vav1-Cre recombines efficiently in spleen, but 
not bone marrow, endothelial cells. Vav1-cre; R26!" mice recom- 
bined throughout the red pulp in VE-cadherin* Scf-GFP* cells but 
only in rare white pulp cells (Fig. 4a-c). VE-cadherin * Scf-GFP* cells 
accounted for 0.37 + 0.07% of enzymatically dissociated spleen cells 
and 83 + 5.3% of these cells recombined with Vav1-Cre (Fig. 4b). These 
cells were negative for PDGFR-8 (Extended Data Fig. 5a). Seventy + 5% 
of VE-cadherin* endothelial cells were Tomato* in the spleens of 
Vav1-cre; R26'4!™4!0 mice but only 8.4+0.5% were Tomato* in 
bone marrow (Extended Data Fig. 5b, e-h). Endothelial cells from 
Vav1-cre; Scfl”~ mice exhibited a 6.5-fold reduction in Scf transcript 
levels (Extended Data Fig. 5c) and a 5.6-fold reduction in SCF protein 
(Extended Data Fig. 5d) relative to endothelial cells from Scf/”— 
controls. 

In the livers of Vav1-cre; R26'¢™4, Soft P mice recombination 
occurred in 26 + 4.2% of VE-cadherin *Scf-GFP* cells (Extended Data 
Fig. 5i-k). Upon induction of EMH by Cy+G-CSF, Vav1-Cre recom- 
bination did not significantly change in the spleen (Extended Data 
Figs 5b and 6a, b), bone marrow (Extended Data Figs 5b and 6c, d) or 
liver (Extended Data Fig. 6e, f). 

Cxcl12 was not expressed by spleen endothelial cells (Fig. le). 
Consistent with this, Vav1-cre; Cxcl12!/”~ mice had normal blood 
counts, cellularity, and numbers of HSCs, CMPs, GMPs and MEPs in 
bone marrow and spleen after Cy+G-CSF (Extended Data Fig. 6g-s). 

Vav1-Cre also recombines in haematopoietic cells** but haemato- 
poietic cells do not express Scf and Vav1-cre; Scf!”~ mice have nor- 
mal HSC frequency and haematopoiesis in bone marrow!®". Prior to 
EMH induction with Cy+G-CSE, Vav1-cre; Scf!”~ mice did not sig- 
nificantly differ from Scf”~ controls with respect to bone marrow or 
spleen cellularity, or the numbers of HSCs, CMPs, GMPs or MEPs in 
the bone marrow or spleen (Fig. 4d-i and Extended Data Fig. 6w-z). 
After Cy+G-CSF treatment, bone marrow cellularity and numbers of 
bone marrow HSCs, CMPs, GMPs or MEPs in Vav1-cre; Sof” ~ mice 
were normal (Fig. 4d, e and Extended Data Fig. 6w-y). However, RBC 
counts, spleen cellularity, and the numbers of spleen HSCs, CMPs 
and MEPs declined in Vav1-cre; Scf!”~ mice relative to Scf/~ controls 
(Fig. 4f-i and Extended Data Fig. 6t-v). 

The decline in blood cell counts in Vav1-cre; Scf™ ~ mice after EMH 
induction was caused by reduced spleen EMH because splenectomy 
significantly reduced RBC and WBC counts but conditional deletion 
of Scfin splenectomized Vav1-cre; Scf!”~ mice had no further effect on 
blood cell counts (Fig. 4j, k). We did not detect any difference between 
Vav1-cre; Scf!”~ mice and controls in terms of the frequency or mor- 
phology of vascular or stromal cells in the spleen (Extended Data 
Fig. 40-u). Endothelial SCF expression is thus necessary for splenic 
EMH and the recovery of blood cell counts after Cy-+G-CSF. 


The splenic EMH niche during pregnancy 

Erythropoiesis and myelopoiesis significantly increased in the red pulp 
during pregnancy, profoundly increasing spleen cellularity, HSC num- 
ber, and progenitor numbers relative to non-pregnant mice (Extended 
Data Fig. 7a-i). Just as in Cy+-G-CSF-treated mice, Scf-GFP was largely 
expressed by endothelial and perivascular stromal cells in the red pulp 
and Cxcl12-DsRed was expressed by a subset of the Scf-GFP* stro- 
mal cells (Extended Data Fig. 7j-l). Pregnancy induced these cells to 
proliferate, significantly expanding their numbers (Extended Data 
Fig. 7m-o). In pregnant mice, Tcf21-Cre/ER recombined in spleen 
PDGFR-$*LepR™ stromal cells but not in bone marrow and rarely in 
liver (Extended Data Fig. 7p—v). Vav1-cre; Sof mice were infertile, 
preventing us from testing the endothelial contribution to EMH during 
pregnancy. 
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Figure 4 | Endothelial cells are an important source of SCF for EMH 

in the spleen. a, Vav1-cre; R267""*"° mice exhibited vascular Tomato 
expression throughout the splenic red pulp (RP). Tomato was also 
expressed by haematopoietic cells in these mice but levels of Tomato 
expression in endothelial cells were ~10-100 fold higher than in 
haematopoietic cells. Therefore short-exposure images showed mainly 
Tomato fluorescence in endothelial cells. WP, white pulp. b, c, Vav1-Cre 
recombined in VE-cadherin* Scf-GFP* endothelial cells (arrowheads in c) 
but not in VE-cadherin” Scf-GFP* perivascular stromal cells (arrows in c). 
d-i, Vav1-cre; Sof” ~ mice and Sof” ts Sof” ~ controls were not treated 
(NT) or treated with Cy+4-21 d G-CSF to induce EMH. Data show the 
number of bone marrow (BM) cells (d) and bone marrow HSCs (e) in one 
femur plus one tibia as well as spleen (SP) cellularity (f) and the numbers of 
HSCs (g), CMPs (h) and MEPs (i) in the spleen. j, k, WBC (j) and RBC (k) 
counts in splenectomized and sham-operated mice before and after 
Cy+21 d G-CSF treatment. The numbers of mice per treatment are shown 
in the bars in each panel. All data reflect mean +s.d. from 3 (a-c; j, k) 

or 6 (d-i) independent experiments. d-i, *P < 0.05, **P< 0.01, 

**+*P < 0.001, statistical significance relative to Sof, +P <0.05, 
+tP<0.01, tttP < 0.001, statistical significance between Scf/”~ and 
Vav1-cre; ScflV— mice. j,k, *P < 0.05, **P< 0.01, ***P< 0.001, statistical 
significance relative to sham-operated Soff’. +P <0.05, ttP< 0.01, 
statistical significance between other treatments. 


Pregnant Tcf21°”"®; Scf!" females did not differ from Scf/" control 
females in terms of bone marrow cellularity (Fig. 5a), or the numbers of 
HSCs (Fig. 5b), GMPs, CMPs or MEPs in the bone marrow (Extended 
Data Fig. 8a—d). In contrast, pregnant Tcf21°"”"®; Scf!”" females exhib- 
ited significantly lower spleen cellularity and numbers of HSCs, GMPs, 
CMPs, MEPs, ye hae and erythroid cells in the spleen as compared 
to pregnant Scf!“" females (Fig. 5c-f and Extended Data Fig. 8¢, f). 
Pregnant Tcf21°’"®; Scf!"" females had significantly lower RBC counts 
than pregnant Scf!“" controls (Fig. 5g), and significantly lower fetal 
mass (Fig. 5h). SCF from Tcf21* perivascular cells is thus necessary for 
splenic EMH and for the expansion of erythropoiesis during pregnancy. 
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Figure 5 | SCF from endothelial cells and Tcf21* stromal cells is 
necessary for splenic EMH and adequate erythropoiesis after bleeding 
or during pregnancy. a-h, Four-to-six-month-old female mice that had 
been treated with tamoxifen at least 2 months earlier were mated with 
normal wild-type males. a-f, Normal females and pregnant females at 
gestation day 18.5 were analysed: the number of bone marrow (BM) cells 
(a) and bone marrow HSCs (b) in one femur plus one tibia as well as 
spleen (SP) cellularity (c), and the numbers of HSCs (d), CMPs (e) and 
MEPs (f) in the spleen are shown. g, h, RBC counts (g) and fetal mass (h). 
i-n, Four-to-six-month-old mice with the indicated genetic backgrounds 
were repeatedly bled over a 2-week period then analysed: the number of 
bone marrow cells (i) and bone marrow HSCs (j) in one femur plus one 
tibia as well as spleen cellularity (k), and the numbers of HSCs (1), CMPs 
(m) and MEPs (n) in the spleen are shown. 0, RBC counts. The numbers 
of mice per treatment are shown in each bar of each panel. All data reflect 
mean + s.d. from 4 (a-h) or 3 (i-o) independent experiments. *P < 0.05, 
**P< 0.01, ***P< 0.001, statistical significance relative to normal mice. 
+P <0.05, tt P< 0.01, statistical significance between single mutants 

and compound mutants. +P < 0.05, +P < 0.01, $44P < 0.001, statistical 
significance between Scf mutant mice and control mice after bleeding or 
pregnancy. 


Pregnant Tcf21°°"®; Cxcl12!”~ females also had significantly reduced 
splenic cellularity and splenic erythropoiesis relative to pregnant 
Cxcl12/”~ controls, without any changes in bone marrow haemato- 
poiesis (Extended Data Fig. 8i-x). 


The splenic EMH niche after blood loss 

Repeated bleeding significantly increased erythropoiesis and mye- 
lopoiesis in the red pulp, increasing spleen cellularity, HSC number, 
and progenitor numbers relative to non-bled controls (Extended Data 
Fig. 9a-i). Just as in Cy+-G-CSF-treated mice, Scf-GFP was largely 
expressed by endothelial cells and perivascular stromal cells in the red 
pulp while Cxcl12-DsRed was expressed by a subset of Scf-GFP* stro- 
mal cells (Extended Data Fig. 9j-1). Blood loss induced the prolifera- 
tion of these cells, significantly expanding their numbers (Extended 
Data Fig. 9m-o). In bled mice, Tcf21-Cre/ER recombined in red pulp 
PDGER-8tLepR— stromal cells, but not in bone marrow and rarely in 
liver (Extended Data Fig. 9p-v). Vav1-Cre recombined in 66 + 4.2% of 
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spleen endothelial cells, mainly in the red pulp, but only in 7.5 + 4.0% of 
bone marrow endothelial cells and 25 + 5.8% of liver endothelial cells 
(Extended Data Fig. 10a-h). 

Bled Tcf21°"®; Scf/" mice or Vav1-cre; Scf!" mice did not differ 
from bled Scf' controls in bone marrow cellularity (Fig. 5i), or the 
numbers of HSCs (Fig. 5j), GMPs, CMPs or MEPs in the bone marrow 
(Extended Data Fig. 10i-1). In contrast, bled Tcf21°/#8; Scf!" mice 
and Vav1-cre; Scf!“" mice each had significantly lower RBC counts, 
spleen cellularity, and numbers of HSCs, GMPs, CMPs, MEPs, myeloid 
and erythroid cells in the spleen as compared to bled Scf" controls 
(Fig. 51-o and Extended Data Fig. 10m-p). Tcf217 stromal cells and 
endothelial cells are thus necessary for EMH in the spleen and for the 
expansion of erythropoiesis after bleeding. 

Endothelial and Tcf21* stromal cells had additive effects on splenic 
EMH and the recovery of RBC counts after bleeding. Bled Vav1-cre; 
Tef21°”ER; ScfM" mice had similar bone marrow cellularity and num- 
bers of HSCs in the bone marrow as bled Scf!“" controls (Fig. 5i, j). 
However, they had significantly reduced RBC counts, spleen cellular- 
ity, and numbers of HSCs, MEPs and erythroid cells in the spleen as 
compared to bled Scf/" mice, bled Vav1-cre; Scf!"" mice, and bled 
Tef21*®; Scf" mice (Fig. 5k-n and Extended Data Fig. 10p). 

Bled Tef21°/#®; Cxcl12/”~ mice also had significantly reduced 
cellularity, MEPs and erythroid cells in the spleen as well as signifi- 
cantly reduced RBC counts as compared to bled Cxcl12/”~ controls, 
without any differences in bone marrow haematopoiesis (Extended 
Data Fig. 10q-e’). 

The EMH niche in mouse spleen is created by endothelial cells and 
Tcf21-expressing stromal cells associated with red pulp sinusoids and 
is functionally important for haematopoietic recovery from a range of 
stresses. A prior study’? detected CXCL12 expression in endothelial 
cells in human spleens. This suggests that endothelial cells are also 
a component of the EMH niche in humans, but there may be spe- 
cies differences in CXCL12 expression among niche cells. It is not 
clear whether there is any relationship between the Cxcl12-abundant 
reticular (CAR) cells that are part of the bone marrow niche” and the 
Cxcl12-expressing stromal cells in the splenic EMH niche. While bone 
marrow CAR cells are LepR* and Tcf21~, spleen CAR cells are Tcf21* 
and LepR-. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Mice. All mice were maintained on a C57BL/6 background, including Scf@!” 
(ref. 19), Scf!”* (ref. 19), Cxcl12?°®"4 (ref. 18), Cxcl12/”* (ref. 18), R26'¢7"” (ref, 
26), Vav1-cre (ref. 24), Lepr” (ref. 27), Tef217°"® (ref. 21) and a-catulin“?. To 
induce Cre/ER activity in Tcf21/#® mice, 4-6-week-old mice were administered 
2mg tamoxifen (Sigma) daily by oral gavage for 12 consecutive days. For induction 
of EMH, mice were injected at day 0 with a single dose of 4 mg cyclophospha- 
mide followed by daily injections of 5\1g G-CSF for 4-21 days. Both male and 
female mice were used. All mice were housed in the Animal Resource Center at the 
University of Texas Southwestern Medical Center (UTSW). All procedures were 
approved by the UTSW Institutional Animal Care and Use Committee. 

Flow cytometric analysis of haematopoietic cells. Bone marrow cells were iso- 
lated by flushing the femur or tibia with Ca**- and Mg”*-free HBSS with 2% 
heat-inactivated bovine serum using a 3 ml syringe fitted with a 25-gauge nee- 
dle. Spleen cells were obtained by crushing the spleen between two frosted slides. 
The cells were dissociated to a single-cell suspension by gently passing through 
the needle several times and then filtering through a 40-j1m nylon mesh. Blood 
was collected by cardiac puncture, and white blood cells were isolated by ficoll 
centrifugation according to the manufacturer's instructions (GE Healthcare). The 
following antibodies were used to isolate HSCs: anti-CD150 (TC15-12F12.2), 
anti-CD48 (HM48-1), anti-Sca-1 (E13-161.7), anti-c-kit (2B8) and the following 
antibodies against lineage markers (anti-Ter119, anti-B220 (6B2), anti-Gr-1 (8C5), 
anti-CD2 (RM2-5), anti-CD3 (17A2), anti-CD5 (53-7.3) and anti-CD8 (53-6.7)). 
Haematopoietic progenitors were identified by flow cytometry using the following 
antibodies: anti-Sca-1 (E13-161.7), anti-c-Kit (2B8) and the following antibodies 
against lineage markers (anti-Ter119, anti-B220 (6B2), anti-Gr-1 (8C5), anti-CD2 
(RM2-5), anti-CD3 (17A2), anti-CD5 (53-7.3) and anti-CD8 (53-6.7)), anti-CD34 
(RAM34), anti-CD135 (Elt3) (A2F10), anti-CD16/32 (FcyR) (93), anti-CD127 
(IL7Ra) (A7R34), anti-CD24 (M1/69), anti-CD43 (1B11), anti-B220 (6B2), 
anti-IgM (II/41), anti-CD3 (17A2), anti-Gr-1 (8C5), anti-Mac-1 (M1/70), 
anti-CD41 (MWReg30), anti-CD71 (C2) and anti-Ter119. 4’,6-Diamidino-2- 
phenylindole (DAPI) was used to exclude dead cells. Antibodies were obtained 
from eBioscience or BD Bioscience. 

Flow cytometric analysis of stromal cells. To isolate bone marrow stromal cells 
the marrow was gently flushed out of the bone marrow cavity with a 3-ml syringe 
fitted with a 23-guage needle and then transferred into 1 ml pre-warmed bone 
marrow digestion solution (200 U ml“! DNase I (Sigma), 250j1g ml! Liberase”” 
(Roche) in HBSS plus Ca?+ and Mg”*) and incubated at 37 °C for 30 min with gen- 
tle shaking. To isolate splenic stromal cells, the spleen capsule was cut into ~1 mm? 
fragments using scissors and then digested as described earlier in spleen digestion 
solution (200 U ml“! DNase I, 250,j1g ml“! Liberase”", 1 mg ml! collagenase, type 
4 (Roche) and 500 1g ml! collagenase D (Roche) in HBSS plus Ca?* and Mg"). 
After a brief vortex, the spleen fragments were allowed to sediment for ~3 min and 
the supernatant was transferred to another tube on ice. The sedimented (undi- 
gested) spleen fragments were subjected to a second round of digestion. The two 
fractions of digested cells were pooled and filtered through a 100-|1m nylon mesh. 
Anti-PDGFR-a (APA5), anti-PDGFR-8 (APBS5), anti-LepR (R&D), anti-CD45 
(30F-11) and anti-Ter119 antibodies were used to isolate stromal cells. For analysis 
of endothelial cells, mice were injected intravenously into the retro-orbital venous 
sinus with 10,1g Alexa-Fluor-660-conjugated anti- VE-cadherin antibody (BV13) 
10 min before being killed. Samples were analysed using a FACSAria or FACSCanto 
II flow cytometer (BD Biosciences). 

BrdU incorporation assay. To assess BrdU incorporation into spleen cells after 
EMH induction, mice were intraperitoneally injected with a single dose of BrdU 
(2mg BrdU per mouse) then maintained on 0.5 mg BrdU per ml drinking water 
for 7 days. Endothelial cells were labelled by intravenous injection of an anti-VE- 
cadherin antibody (eBioscience). Enzymatically dissociated spleen cells were 
stained with antibodies against surface markers and the target cell populations 
were sorted then resorted to ensure purity. The sorted cells were then fixed, and 
stained with an anti-BrdU antibody using the BrdU APC Flow Kit (BD Biosciences) 
according to the manufacturer’s instructions. 

Long-term competitive reconstitution assay. Adult recipient mice were irradi- 
ated using an XRAD 320 X-ray irradiator (Precision X-Ray) with two doses of 
540 rad (total 1,080 rad) delivered at least 2h apart. Cells were injected into the 
retro-orbital venous sinus of anaesthetized mice. Sorted doses of splenocytes from 
donor mice with EMH were transplanted along with 3 x 10° recipient bone marrow 
cells. Recipient mice were bled every 4 weeks to assess the level of donor-derived 
blood cells, including myeloid, B and T cells for at least 16 weeks. Blood was sub- 
jected to ammonium chloride/potassium red cell lysis before antibody staining. 
Antibodies including anti-CD45.2 (104), anti-CD45.1 (A20), anti-Gr1 (8C5), 
anti-Mac-1 (M1/70), anti-B220 (6B2) and anti-CD3 (KT31.1) were used for flow 
cytometric analysis. 


Tissue sectioning and confocal imaging. For bone marrow sections, freshly dis- 
sected bones were fixed in 4% paraformaldehyde overnight followed by 3 days of 
decalcification in 10% EDTA dissolved in PBS. Bones were sectioned using the 
CryoJane tape-transfer system (Instrumedics). For spleen sections, freshly dis- 
sected spleens were fixed in 4% paraformaldehyde for 1h followed by 1 day incuba- 
tion in 10% sucrose in PBS. Frozen spleens were sectioned with a cryostat (Leica). 
For whole mount imaging, spleens were sectioned into ~2 mm pieces. Spleen sec- 
tions were blocked in PBS with 10% horse serum for 1h and then stained overnight 
with chicken-anti-GFP (Aves) and/or rabbit-anti-laminin (Abcam) antibodies. 
Donkey-anti-chicken Alexa Fluor 488 and/or donkey-anti-rabbit Alexa Fluor 647 
were used as secondary antibodies (Invitrogen). Specimens were mounted with 
anti-fade prolong gold (Invitrogen) and images were acquired with either a Zeiss 
LSM780 confocal microscope or a Leica SP8 confocal microscope equipped with a 
resonant scanner. Three-dimensional images were achieved using Bitplane Imaris 
v.7.7.1 software. 

Deep imaging of spleens. Spleens were harvested and fixed for 4h in 4% PFA at 
4°C. Since the spleen capsule is highly autofluorescent, spleens were sectioned 
perpendicular to the long axis into 300-|:m-thick sections using a Leica VT100S 
vibrotome. These 300-j1m sections were fixed for an additional 2h in 4% PFA and 
blocked overnight in staining solution (10% dimethylsulfoxide (DMSO), 0.5% 
IgePal630 (Sigma) and 5% donkey serum (Jackson Immunoresearch) in PBS). 
All staining steps were performed in staining solution on a rotator at room tem- 
perature. Spleen sections were stained for 3 days in primary antibodies, washed 
overnight in several changes of PBS then stained for 3 days in secondary anti- 
bodies. The stained sections were dehydrated in a methanol dehydration series 
then incubated for 3h in 100% methanol with several changes. The methanol was 
then exchanged with benzyl alcohol:benzyl benzoate 1:2 mix (BABB clearing”*). 
The tissues were incubated in BABB for 3h to overnight with several exchanges 
of fresh BABB. Spleen sections were mounted in BABB between two coverslips 
and sealed with silicone (Premium waterproof silicone II clear; General Electric). 
We found it necessary to clean the BABB of peroxides (which can accumulate 
as a result of exposure to air and light) by adding 10g of activated aluminium 
oxide (Sigma) to 40 ml of BABB and rotating for at least 1h, then centrifuging at 
2,000 g for 10 min to remove the suspended aluminium oxide particles. Images were 
acquired using a Zeiss LSM780 confocal microscope with a Zeiss LD LCI Plan-Apo 
x 25/0.8 multi-immersion objective lens, which has a 570j1m working distance. 
Images were taken at 512 x 512 pixel resolution with 2|1m Z-steps, pinhole for the 
internal detector at 47.7;1m. Random spots were inserted into images by gener- 
ating randomized X, Y, and Z coordinates using the random integer generator at 
http:// www.random.org. 

Splenectomy. After mouse anaesthesia by ketamine/xylazine, a ventral midline 
incision was made and the peritoneum was breached. The splenic blood vessels 
were ligated with an absorbable suture (4-0 vicryl). The splenic vessels were cut 
distal to the suture and the spleen was removed. The vessels were cauterized 
and the abdomen was sutured with non-absorbable sutures (3-0 Tevdek III). 
Buprenorphine was administered every 12h for 3 days to minimize postoperative 
pain and mice were maintained with ampicillin-containing water to avoid infec- 
tion. Complete blood counts were measured one month after the survival surgery. 
Induction of EMH by bleeding. EMH was induced by repeated bleeding over a 
2-week period according to a published protocol’. Briefly, 4-6 month-old mice 
were bled via the tail vein five times, every 3 days, removing approximately 25011 
of blood each time, then the mice were killed for analysis 2 days after the last bleed. 
Western blot. Approximately 30,000 CD45 Ter119- VE-cadherin* splenic 
endothelial cells were flow cytometrically sorted into 50 11 of 66% trichoracetic acid 
(TCA) in water. Extracts were incubated on ice for at least 15 min and centrifuged at 
16,100 g at 4°C for 10 min. Precipitates were washed in acetone twice and the dried 
pellets were solubilized in 9 M urea, 2% Triton X-100, and 1% dithiothreitol (DTT). 
Samples were separated on 4-12% Bis-Tris polyacrylamide gels (Invitrogen) 
and transferred to PVDF membrane (Millipore). The blots were incubated 
with primary antibodies overnight at 4°C and then with secondary antibodies. 
Blots were developed with the SuperSignal West Femtochemiluminescence kit 
(Thermo Scientific). Primary antibodies used: rabbit-anti-SCF (Abcam, 1:1,000) 
and mouse-anti-actin (Santa Cruz, clone AC-15, 1:20,000). 

Quantitative real-time PCR. Cells were sorted directly into Trizol (Life 
Technologies). Total RNA was extracted according to the manufacturer’s 
instructions. Total RNA was reverse transcribed using SuperScript III Reverse 
Transcriptase (Life Technologies). Quantitative real-time PCR was performed 
using SYBR green on a LightCycler 480 (Roche). -Actin was used to normal- 
ize the RNA content of samples. Primers used in this study were Scf: 5’-GCCA 
GAAACTAGATCCTTTACTCCTGA-3’ and 5’-CATAAATGGTTTTGTG 
ACACTGACTCTG-3’; B-actin: 5’-GCTCTTTTCCAGCCTTCCTT-3’ and 
5'-CTTCTGCATCCTGTCAGCAA-3’, 
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Gene expression profiling. Three independent samples of 5,000 spleen Scf- 
GFPtVE-cadherin™ spleen stromal cells and two independent samples of 5,000 
unfractionated spleen cells were flow cytometrically sorted into Trizol. Total RNA 
was extracted, amplified, and sense strand cDNA was generated using the Ovation 
Pico WTA System V2 (NuGEN) according to the manufacturer’s instructions. 
cDNA was fragmented and biotinylated using the Encore Biotin Module (NuGEN) 
according to the manufacturer's instructions. Labelled cDNA was hybridized to 
Affymetrix Mouse Gene ST 1.0 chips according to the manufacturer's instructions. 
Expression values for all probes were normalized and determined using the robust 
multi-array average (RMA) method”. 

Statistical methods. Panels in all figures represented multiple independent experi- 
ments performed on different days with different mice. Sample sizes were not based 
on power calculations. No randomization or blinding was performed. No animals 
were excluded from analysis. Variation is always indicated using standard devia- 
tion. For analysis of the statistical significance of differences between two groups 
we generally performed two-tailed Student's t-tests. For analysis of the statistical 
significance of differences among more than two groups, we performed repeated 
measures one-way analysis of variance (ANOVA) tests with Greenhouse-Geisser 
correction (variances between groups were not equal) and Tukey’s multiple com- 
parison tests with individual variances computed for each comparison. To assess 
the statistical significance of differences in fetal mass between paired control and 
mutant mice (Fig. 5j and Extended Data Fig. 8v), we performed a two-way ANOVA. 
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Extended Data Figure 1 | Cy+21 d G-CSF treatment induces EMH in 
the spleen and deletion of Scf from LepR* cells significantly reduces 
the number of HSCs in the bone marrow and the spleen after induction 
of EMH. a,b, Staining with anti-laminin antibody distinguished the 
vasculature of red pulp (RP) from white pulp (WP). The red pulp and 
white pulp were marked by clusters of Ter119* cells (red) and CD3* cells 
(blue), respectively*”. Dashed line depicts the boundary between red 
pulp and white pulp (representative images from 3 mice in 3 independent 
experiments). c, Spleen sections of the same magnification show 

the enlargement of the spleen after induction of EMH by Cy+21d 
G-CSF. These are the same images as in Fig. 1a,d, adjusted to reflect the 
same magnification. d, e, Imaging of thick spleen sections from Scf"?; 
Cxcl1 2°54 mice after the induction of EMH by Cy+21 d G-CSE. 

e, High-magnification view of the boxed area in d. Dashed lines depict 
the boundaries between white pulp and red pulp. Arrow indicates the 
central arteriole in the white pulp around which stromal cells expressed 
Cxcl12-DsRed (representative images from 3 mice from 3 independent 
experiments). f, g, Haematoxylin and eosin (H&E) staining showing the 
increase in haematopoiesis in the spleen after induction of EMH using 
Cy+G-CSF (+-EMH, g) as evidenced by the presence of megakaryocytes 
(arrows; n=3 mice per condition from 3 independent experiments). 
h-n, Cy+G-CSF treatment significantly increased spleen cellularity 

(h), as well as the numbers of HSCs (i), MEPs (j), frequencies of colony- 
forming progenitors (k), numbers of Ter119* erythroid cells (1) and 
Gr-1*Mac-1* myeloid cells (m) in the spleen but not the number of 
B220+ or CD3* lymphoid cells (n). The numbers of mice per treatment 
are shown in each bar of each panel. Each panel shows mean + s.d. from 
five independent experiments. 0, p, Scf-GFP (0) and Cxcl12-DsRed (p) 
fluorescence by spleen stromal cells before (— EMH) and after induction 
of EMH (+EMH) using Cy+G-CSE gq, r, The frequencies (q) and absolute 
numbers (r) of Scf-GFP*VE-cadherin* endothelial cells and Scf-GFP*VE- 
cadherin” stromal cells significantly increased upon induction of EMH 


by Cy+21 d G-CSF (+EMH). s, Spleens from Lepr’; R26! 7male, ScfCFP 
mice showed Tomato expression was primarily in the stromal cells of the 
white pulp. Although most Scf-GFP expression was in endothelial cells 
and perivascular stromal cells of the red pulp (Fig. la-d), some Scf-GFP* 
stromal cells were in the white pulp, most of which appeared to express 
LepR. Dashed line depicts the boundary between red pulp and white pulp 
(representative images of 6 mice from 4 independent experiments). 

t, Flow cytometric analysis of enzymatically dissociated spleen cells 

from Lepr“; R26'472m!0; Scf5¥P mice showed that only a small minority 
of non-endothelial Scf-GFP* cells were positive for Tomato (n =3 mice 
from 3 independent experiments). u, Tomato*CD45~ Ter119~ stromal 
cells in the spleens of Lepr-cre; R26!47a'0 mice expressed PDGFR-a, 
PDGFR-8, Sca-1 and LepR (n =3 mice from 3 independent experiments). 
v, Percentage of all CFU-F colonies formed by enzymatically dissociated 
spleen cells from Lepr“; R26'47"4° mice that expressed Tomato. 
Macrophage colonies were excluded by staining with anti-CD45 antibody 
(n=4 mice from 3 independent experiments). w, Lepr“; ScffY- mice had 
significantly fewer HSCs in the bone marrow than wild-type and Scf!”~ 
controls before induction of EMH (n=4 mice per genotype per time 
point mice from 4 independent experiments). NT, not treated. 

x, y, Lepr“; Scf!”~ mice displayed significantly lower spleen cellularity 
(x) and HSC number (y) in the spleen than wild-type and Scffv- controls 
after induction of EMH with cyclophosphamide plus 4 days of G-CSF. 
The numbers of mice per treatment are shown in each bar. Data represent 
mean +s.d. from 4 independent experiments. h-n, q, r, The statistical 
significance of differences was assessed using two-tailed Student's t-tests 
(***P < 0.001). w-y, The statistical significance of differences between 
genotypes was assessed using repeated measures one-way ANOVAs with 
Greenhouse-Geisser correction and Tukey’s multiple comparison tests 
with individual variances computed for each comparison. *P < 0.05, 

** P< 0.01, statistical significance relative to wild-type (Scf*’). +P < 0.05, 
+tP < 0.01, statistical significance between Scf/’~ and Lepr"; Scf!”~. 
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Extended Data Figure 2 | Scfis expressed by most endothelial cells 

but not by Tcf21* perivascular cells in the liver; Cy-+-21 d G-CSF does 
not significantly change the recombination pattern of Tcf21-Cre/ER 

in the spleen, bone marrow or liver. a—i, To identify Cre alleles that 
recombine in spleen, but not bone marrow, stromal cells we assessed the 
gene expression profile of spleen Scf-GFP*VE-cadherin™ stromal cells 
(Extended Data Table 1). Nestin, NG2 (also known as Cspg4) and Prx1 
were low or undetectable (data not shown). Nestin-Cre*!, NG2-Cre*”, 
NG2-Cre/ER*?, and Prx1-Cre** did not recombine widely or specifically in 
Scf-GFP* stromal cells in the spleen (data not shown). Pdgfra and Pdgfrb 
were expressed by spleen Scf-GFP* stromal cells but neither Pdgfra-Cre/ER 
(ref. 35) nor Pdgfrb-Cre (ref. 36) recombined efficiently (data not shown). 
Sm22 (also known as Tagln), Myh11, Sma (also known as Acta2) and Tcf21 
were significantly more highly expressed by spleen than bone marrow 
Scf-GFP* stromal cells (Extended Data Table 1). Sm22-Cre (ref. 37), 
Myh11-Cre (ref. 38) and Sma-Cre/ER (ref. 39) recombined in few spleen 
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Scf-GFP* stromal cells (data not shown). However, Tcf21-Cre/ER 
recombined in perivascular stromal cells in the spleen but not bone 
marrow (Fig. 2). a-c, Under normal conditions, Scf-GFP was expressed by 
most VE-cadherin* endothelial cells (arrowheads in a) but not by Tcf21* 
stromal cells (arrows in a) in the liver (1 =3 mice from 3 independent 
experiments). d, e, EMH induced by Cy+21 d G-CSF did not alter the 
general distribution (d) or perivascular localization (e) of Tomato* cells 
in the spleens of Tcf21’"8; R26'4%™!° mice as compared to normal mice 
(Fig. 2a, d). f, g, Tomato expression was undetectable in the bone marrow 
of Tof21°/F®; R26'4Tmato mice after Cy-+G-CSF treatment irrespective of 
whether the bone marrow was analysed by whole-mount imaging (f) or 
flow cytometry (g). h, i, EMH induced by Cy+G-CSF did not significantly 
change the frequency (h) or perivascular localization (i, arrows) of 
Tomato™ cells in the livers of Tef21°/#8; R26'4!mato mice. d-i, n=3 mice 
from 3 independent experiments. 
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Extended Data Figure 3 | Deep imaging of HSCs in the spleen; deletion 
of Scf or Cxcl12 from Tcf21-expressing stromal cells in the spleen 
reduced peripheral blood cell counts but did not affect bone marrow 
haematopoiesis. a, b, The vast majority of c-Kit* haematopoietic 
progenitors localized adjacent to Tcf21-expressing stromal cells in the red 
pulp of the normal spleen (n =3 mice from 3 independent experiments). 
c, d, Three-hundred-micrometre-thick sections of spleen before (c) and 
after optical clearing (d). e, f, Deep imaging of a-catulin-GFP*c-Kitt 
HSCs in cleared spleen segments from Tef21/=8; R264"; y-catulin 
mice. A representative high-magnification image of an a-catulin-GFP*c- 
Kitt HSC surrounded by Tomato* stromal cells (e). f, Low-magnification 
view of a digitally reconstructed 300-\1m-thick spleen fragment with 
a-catulin-GFP* c-Kit* HSCs identified by large yellow spheres. Note 

that actual HSCs would be smaller than the yellow spheres but would 

not be visible at this magnification (n =3 mice from 3 independent 
experiments). g-m, Tef21°/"8; Scf! and Scf/" control mice were treated 
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with tamoxifen then examined 1 month later without further treatment 
(not treated (NT)) or after treatment with cyclophosphamide plus 4, 8, 

or 21 days of G-CSF to induce EMH. Data show WBC (g), RBC (h) and 
PLT counts (i), numbers of CMPs (j), GMPs (k) and MEPs (1) in the bone 
marrow and numbers of GMPs in the spleen (m). n, Platelet counts of 
sham-operated and splenectomized mice that were treated with Cy+21 

d G-CSF 1 month after surgery. o-u, Tcf21°’"®; Cxcl12"~ mice and 
littermate controls (Cxcl12/”~ or Cxcl12*!~) were treated with tamoxifen 
then examined 1 month later without further treatment (NT) or after 
treatment with cyclophosphamide plus 4, 8, or 21 days of G-CSF to induce 
EMH. Data show WBC (0), RBC (p) and PLT counts (q), numbers of 
CMPs (r), GMPs (s) and MEPs (t) in the bone marrow and numbers of 
GMPs in the spleen (u). The numbers of mice per treatment are shown in 
each panel. All data reflect mean + s.d. from 3 independent experiments. 
Two-tailed Student's t-tests were used to assess statistical significance 
(*P<0.05, ***P<0.001). 
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Extended Data Figure 4 | Conditional deletion of Scf or Cxcl12 with 
Tcf21-Cre/ER, or Scf with Vav1-Cre, does not significantly affect the 
frequency or morphology of stromal cells in the spleen, irrespective 
of EMH induction. a~g, Irrespective of whether the mice were treated 
with Cy+21 d G-CSE, conditional deletion of Scf from Tcf21* cells did 
not significantly change the frequency of VE-cadherin* endothelial cells 
(a) or PDGFR-8* perivascular stromal cells (b), Scf transcript levels in 
endothelial cells (c), or the morphology or density of blood vessels in 
the spleen (d-g). h-n, Irrespective of whether the mice were treated 
with Cy+G-CSE, conditional deletion of Cxcl12 from Tcf21* cells did 
not significantly change the frequency of VE-cadherin* endothelial cells 
(h) or PDGFR-8* perivascular stromal cells (i), Scf transcript levels in 
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endothelial cells or perivascular stromal cells (j), or the morphology or 
density of blood vessels in the spleen (k-n). o-u, Irrespective of whether 
the mice were treated with Cy+-G-CSE, conditional deletion of Scf using 
Vav1-Cre did not significantly change the frequency of VE-cadherin* 
endothelial cells (0) or PDGFR-3* perivascular stromal cells (p), Scf 
transcript levels in perivascular stromal cells (q), or the morphology 

or density of blood vessels in the spleen (r-u). Scf transcript levels in 
flow cytometrically isolated cells were normalized to B-actin and then 
compared to whole spleen cells (c, j and q). The data reflect mean + s.d. 
from 3 mice per genotype per condition in 3 independent experiments. 
Two-tailed Student's t-tests were used to assess statistical significance. 


© 2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


A Vav1-cre; R26tdTomato, spleen 


b 


100 » VE-cadherin* 


0.35+0.05% 9641.5% 1.140.5% a 
EE : -—————“#I ropul 
S ES 
QO 4} 8 > 
OF aay 
Tomato VE-cadherin PDGFRB 
c VE-cadherin’ cells G vE-cadherin* cells from the spleen © Vav1-cre; R26%70mat0, femur 
0.04 go oF 
0) - 1. Scft- e c 
@ 0.03 mw Scf i ay = ‘e 
<x Bf Vav1-cre; Scff/- favicon FG fg 
$ 0.02 : 26 Ss 
va anti-SCF > rom = 
E — Su BS) 
4. 0.01 eer 32 [sy 
o ANti-ACtiN ie See am iS 
0.00 0 2 400 
. - am) 
SP BM LV 121 2 il 


4 0.003% 


q 


0.006% 
A 
a B 


“19% 


Tomato 


‘0 


Vav1-cre; R26tdTomato- ScfFP’ bone marrow 


| : kL 


ite] 


0.45+0.25% 


CD45/Ter119 


VE-cadherin Scf-GFP 


Tomato/GFP/VE-cadherin 


Vav1-cre; R26tdTomato- ScfeFP liver 
78% 


j 


Tomato 


(o} 0 5%, 
VE-cadherin Scf-GFP 


Vav1-cre; R26faTomato- ScfeFP liver 


\ 
25 um 


Tomato/GFP/VE-cadherin 


Extended Data Figure 5 | Vav1-Cre recombines efficiently and 
specifically in spleen endothelial cells but poorly in bone marrow or 
liver endothelial cells. a, Tomato"£"CD45~Ter119~ cells in Vav1-cre; 
tdTomato mice were uniformly positive for VE-cadherin and negative 
for PDGFR-B (n =3 mice from 3 independent experiments). b, Vav1- 
Cre recombined in most spleen endothelial cells but in few bone marrow 
endothelial cells, irrespective of Cy+G-CSF treatment (+EMH). ¢, Scf 
transcript levels were significantly reduced in endothelial cells from 

the spleen but not from bone marrow or liver in Vav1-cre; ScflY— mice 
as compared to Scf!”~ mice. The Scf transcript level was normalized 

to B-actin. d, Western blot showed lower SCF protein levels in splenic 
endothelial cells from Vav1-cre; Scf!”~ mice as compared to Scf!"~ mice. 
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SCF abundance was assessed relative to actin by Image J software (n =3 
mice per genotype from 3 independent experiments). e-h, In the bone 
marrow Vav1-Cre recombined in a minority of endothelial cells, including 
some sinusoidal (arrows in h) and some arteriolar (arrowheads in h) 
endothelial cells, that expressed little Scf-GFP by flow cytometry (f, g). 
The data reflect mean + s.d. from 3 mice per genotype in 3 independent 
experiments. i-k, Vav1-Cre recombined inefficiently in liver endothelial 
cells. Most Tomato? cells in the liver of Vav1-cre; R26'4!", ScfF? mice 
were VE-cadherin* and Scf-GFP* (i; arrows in k) but these cells accounted 
for only 26 + 4.2% of Scf-GFP* cells by flow cytometry (i, j) and confocal 
microscopy (k, n=3 mice from 3 independent experiments). Two-tailed 
Student's t-tests were used to assess statistical significance. 
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Extended Data Figure 6 | EMH induced by Cy+G-CSF does not 
significantly change the recombination pattern of Vav1-Cre in the 
spleen, bone marrow, or liver but deletion of Scf from endothelial cells 
in spleens with EMH reduces blood cell counts without affecting bone 
marrow haematopoiesis. a, b, After EMH induced by Cy+21 d G-CSF, 
Vav1-Cre-recombined cells were predominantly in the red pulp (a) and 
co-localized with VE-cadherin* cells (b) in the spleen. c-f, After EMH 
induced by Cy+21 d G-CSF, Vav1-Cre-recombined cells remained rare 
in the bone marrow (c, d) and liver (e, f; 7 =3 mice from 3 independent 
experiments). g-s, Vav1-cre; Cxcl12/”~ mice and Cxcl12"~ controls were 
treated with Cy+4-21 d G-CSF to induce EMH. Data show WBC (g), 
RBC (h), and platelet (i) counts, spleen cellularity (j) and numbers of 
HSCs (k), CMPs (1), GMPs (m) and MEPs (n) in the spleen as well as bone 
marrow cellularity (0), and numbers of HSCs (p), CMPs (q), GMPs (r) 
and MEPs (s) in one femur and one tibia. The data represent mean 

mean + s.d. from 3 (Cy+4 d G-CSF treatment) and 5 (Cy+21 d G-CSF 
treatment) independent experiments. The number of mice per treatment 
is indicated on each bar. Two-tailed Student’s t-tests were used to assess 
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statistical significance. t-z, Vav1-cre; a mice and ee ScflY— 
controls were treated with Cy+4-21 d G-CSF to induce EMH. Data show 
WBC (t), RBC (u), and platelet (PLT) (v) counts, numbers of CMPs (w), 
GMPs (x) and MEPs (y) in the bone marrow as well as numbers of GMPs 
in the spleen (z). Note that after 21 days of G-CSF both Scf!”~ and 
Vav1-cre; Scf/”~ mice showed significantly lower CMP numbers relative 
to Scf/’* mice but their CMP numbers were not significantly different 
from each other (w), indicating that CMP numbers in the bone marrow 
were not influenced by Scf deletion from spleen endothelial cells. The 
data represent mean + s.d. from 3 (no treatment (NT)), 3 (4 days), 

3 (8 days), and 8 (21 days) independent experiments. The number of 
mice per treatment is indicated on each bar. The statistical significance 
of differences among genotypes was assessed using repeated measures 
one-way ANOVAs with Greenhouse-Geisser correction and Tukey’s 
multiple comparison tests with individual variances computed for each 
comparison. *P< 0.05, **P<0.01, statistical significance relative to ScflY* 
controls. +P < 0.05, ttP<0.01, statistical significance between Scfl/— 

and Vav1-cre; ScflY~. 
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Extended Data Figure 7 | Pregnancy induces EMH and the proliferation 
of endothelial cells and stromal cells in the spleen without significantly 
changing the recombination pattern of Tcf21-Cre/ER in the spleen, 
bone marrow or liver. a—v, Pregnant female mice were at gestation day 
18.5. a, b, H&E staining showed increased haematopoiesis in the spleens 
of pregnant mice (b) as evidenced by the presence of megakaryocytes 
(arrows; n = 3 mice per condition from 3 independent experiments). c-i, 
Pregnancy significantly increased spleen cellularity (c), as well as the 
numbers of HSCs (d), MEPs (e, f), Ter119* erythroid cells (g) and 
Gr-1*Mac-1* myeloid cells (h) in the spleen but not the number of B220+ 
or CD3* lymphoid cells (i). j, k, During pregnancy, Scf-GFP was expressed 
by VE-cadherin* endothelial cells and VE-cadherin” stromal cells (j) 
while Cxcl12-DsRed was expressed by a subset of the VE-cadherin” Scf- 
GFP* stromal cells (j, k). 1, Whole-mount imaging of a thick spleen section 
from a pregnant Sef”; Cxcl12>°**4 mouse (representative images from 
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3 mice in 3 independent experiments). m, n, In the spleen, the numbers 
of Scf-GFP* cells (m) and Cxcl12-DsRed* cells (n) significantly increased 
upon bleeding. 0, Endothelial and stromal cells in the spleen proliferated 
after bleeding. BrdU was administered to Scf*? mice or Cxcl12?°®*4 mice 
for 18 days, beginning in pregnant mice after the plug was observed. 

The number of mice per treatment is indicated on each bar. Each 

panel shows mean + s.d. from 3 independent experiments. Two-tailed 
Student's t-tests were used to assess statistical significance (**P < 0.01, 

*** P< (0,001). p—r, Pregnancy did not alter the general distribution (p), 
perivascular localization (q) or surface marker expression (r; PDGFR-8* 
and LepR-) of Tomato* cells in the spleens of Tcf21°7"®; R26'4%™#" mice. 
s, t, Tomato expression remained undetectable in the bone marrow of 
pregnant Tef21°/F8; R26'47™0 mice. u, v, During pregnancy, Tcf21-Cre/ 
ER recombined in rare perivascular cells in the liver. p-v, n =3 mice per 
genotype from 3 independent experiments. 
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Extended Data Figure 8 | Conditional deletion of Cxcl12 from Tcf21+ 
stromal cells impairs EMH in the spleens of pregnant mice without 
significantly affecting bone marrow haematopoiesis. a—x, Four-to-six- 
month-old female mice that had been treated with tamoxifen at least 

2 months before were mated with normal wild-type males. Normal females 
and pregnant females at gestation day 18.5 were analysed. a-d, Conditional 
deletion of Scf from Tcf21* cells did not significantly affect the numbers 

of GMPs (a), CMPs (b), MEPs (c), Ter119* (erythroid), Gr-1*Mac-1* 
(myeloid), CD3* (T) and B220* (B) cells (d) in one femur or one tibia. 

e, f, Conditional deletion of Scf from Tcf21* cells significantly reduced 
GMPs (e), Ter119* erythrocytes and Gr-1*Mac-1* myeloid cells (f) in 

the spleen. g, h, Conditional deletion of Scf from Tcf21* cells did not 
significantly affect WBC (g) or platelet counts (h). i-n, Conditional 
deletion of Cxcl12 from Tef21* cells did not significantly affect bone 
marrow cellularity (i), or the numbers of HSCs (j), GMPs (k) CMPs (1), 
MEPs (m), Ter119* (erythroid), Gr-1*Mac-1* (myeloid), CD3* (T) 
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and B220* (B) cells (n) in the bone marrow. o-w, Spleen cellularity 

(o) and numbers of HSCs (p), GMPs (q), CMPs (r), MEPs (s), Ter119* 
(erythroid), Gr-1*Mac-1* (myeloid), CD3* (T) and B220* (B) cells (t) 
in the spleen and WBC (u), platelet (v) and RBC counts (w) in the blood. 
x, Conditional deletion of Cxcl12 from Tcf21* cells in the spleens of 
pregnant mothers did not significantly affect fetal mass. The numbers of 
mice per treatment are shown in each bar within each panel. Each panel 
shows mean + s.d. from 3 independent experiments. a—w, The statistical 
significance of differences among genotypes was assessed using a repeated 
measures one-way ANOVA with Greenhouse-Geisser correction along 
with Tukey’s multiple comparison tests with individual variances x, 

The statistical significance of differences was assessed using a two-way 
ANOVA. *P < 0.05, **P <0.01, ***P < 0.001, statistical significance 
relative to normal mice. +P < 0.05, ttP< 0.01, tt +P < 0.001, statistical 
significance between Scf mutant mice and control mice after bleeding. 
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Extended Data Figure 9 | Bleeding induces EMH and the proliferation 
of endothelial cells and stromal cells in the spleen without significantly 
changing the recombination pattern of Tcf21-Cre/ER in the spleen, 
bone marrow, or liver. a, b, H&E staining showed an increase in 
haematopoiesis in the spleen after repeated bleeding (b; bled) as evidenced 
by the presence of megakaryocytes (arrows; n =3 mice per condition from 
3 independent experiments). c-i, Bleeding significantly increased spleen 
cellularity (c), as well as the numbers of HSCs (d), MEPs (e, f), and the 
numbers of Ter119* erythroid cells (g) and Gr-1*Mac-1* myeloid cells (h) 
in the spleen but not the number of B220* or CD3* lymphoid cells (i). 


j,k, After EMH induced by bleeding, Scf-GFP was expressed by VE-cadherin* 


endothelial cells and VE-cadherin stromal cells (j) while Cxcl12-DsRed 
was expressed by a subset of the VE-cadherin” Scf-GFP* stromal cells 
(j,k). 1, Whole-mount imaging of a thick spleen section from a Scf?; 
Cxcl12?54 mouse after bleeding (representative images from 3 mice in 
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3 independent experiments). m, n, The numbers of Scf-GFPt cells (m) 
and Cxcl12-DsRed* cells (n) significantly increased upon bleeding. 

o, Endothelial and stromal cells in the spleen proliferated after bleeding. 
BrdU was administered to Scf*!” mice or Cxcl12?°**" mice for 15 days 
beginning after the first bleeding. The numbers of mice per treatment 
are shown in each bar in each panel. Each panel shows mean +s.d. from 
three independent experiments. Two-tailed Student’s t-tests were used to 
assess statistical significance (**P< 0.01, ***P < 0.001). p-r, Bleeding did 
not alter the general distribution (p), perivascular localization (q) 

or surface marker expression (r; PDGFR-8* and LepR’) of Tomato* 
cells in the spleens of Tef21°"”/"8; R26!4Tmat0 mice, s, t, Tomato expression 
remained undetectable in bone marrow from Tef21°°/?8; R264 Tomato mice 
after bleeding. u, v, After bleeding, Tcf21-Cre/ER recombined only in 
rare perivascular cells in the liver (p—v; n = 3 mice per genotype from 

3 independent experiments). 
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Extended Data Figure 10 | Blood loss does not significantly change 

the recombination pattern of Vav1-Cre in the spleen, bone marrow, 

or liver; conditional deletion of Cxcl12 from Tcf21* spleen stromal cells 
in bled mice impairs EMH in the spleen without significantly affecting 
bone marrow haematopoiesis. a—e’,Four-to-six-month-old mice with the 
indicated genetic backgrounds were repeatedly bled over a 2-week period. 
a-h, After EMH induced by blood loss, Vav1-Cre recombined efficiently 
in VE-cadherin* endothelial cells in the red pulp of the spleen (a—c) 

but poorly in the bone marrow (d-f) and liver (g, h), similar to what we 
observed under normal conditions (see Fig. 4a—c and Extended Data Fig. 5b) 
(a-h; n= 3 mice from 3 independent experiments). i-n, Conditional 
deletion of Scf using Tcf21-Cre/ER and/or Vav1-Cre did not significantly 
affect the numbers of GMPs (i), CMPs (j), MEPs (k), Ter119* (erythroid), 
Gr-1*Mac-1* (myeloid), CD3+ (T) and B220* (B) cells (1) in the bone 
marrow or WBC (m) or platelet counts in the blood (n). 0, p, Conditional 
deletion of Scf using Tcf21-Cre/ER and/or Vav1-Cre significantly reduced 
GMPs (0), Ter119* erythrocytes and Gr-1*Mac-1* myeloid cells (p) in the 
spleen. i-p, Data represent mean + s.d. from 3 independent experiments. 
q-v, Conditional deletion of Cxcl12 from Tcf21* spleen cells did not 
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peer affect bone marrow cellularity (q), or the numbers of HSCs 
(r), GMPs (s) CMPs (t), MEPs (u), Ter119* (erythroid), Gr-1*Mac-1* 
(myeloid), CD3* (T) and B220* (B) cells (v) in one femur and one tibia 
from bled mice. w-b’, Conditional deletion of Cxcl12 from Tcf21* spleen 
cells significantly reduced spleen cellularity (w), and the numbers of 
MEPs (a’) and erythroid cells (b’) in the spleens of bled mice. Conditional 
deletion of Cxcl12 from Tcf21* spleen cells did not significantly affect the 
numbers of HSCs (x), GMPs (y), or CMPs (z) in the spleens of bled mice. 
c’-e’, Conditional deletion of Cxcl12 from Tcf21* spleen cells significantly 
reduced RBC (c’) but not WBC (d’) or platelet counts (e’) in the blood of 
mice that had been repeatedly bled. q-e’, Data represent mean + s.d. from 
3 independent experiments. The numbers of mice per treatment are shown 
in each bar in each panel. Statistical significance of differences among 
genotypes was assessed using a repeated measures one-way ANOVA with 
Greenhouse-Geisser correction along with Tukey’s multiple comparison 
tests with individual variances. *P < 0.05, **P< 0.01, ***P< 0.001, 
statistical significance relative to normal mice. +P < 0.05, *+P<0.01, 

tttP < 0.001, statistical significance between Scf mutant mice and control 
mice after bleeding. 
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Extended Data Table 1 | Genes that are significantly more highly expressed by Scf-GFP* stromal cells in spleen as compared to bone marrow 


Spleen Scf-GFP* |BM Scf-GFP* 
Coagulation factor C homolog 
Chemokine (C-C motif) ligand 21A 
Actin, alpha 2, smooth muscle, aorta 
Chemokine (C-X-C motif) ligand 13 
Transcription factor 21 
Chloride channel calcium activated 1 

fi27I2a___|\nterferon, alpha-inducible protein 27 like 2A 11.3+0.2 
In hospholamban 
‘arm rostate androgen-regulated mucin-like 1 
ibronectin 1 

ollagen, type XIV, alpha 1 10.4+0.2 
Nuclear receptor subfamily 4, group A, 1 
Angiotensin Il receptor, type 1a 
BJ osteosarcoma oncogene 

ATPase, Na+/K+ transporter, beta 2 10.640.2 
Tenascin XB 
Myosin, heaw polypeptide 11, smooth muscle : 
Heat shook protein 1 
Chloride channel calcium activated 2 
Transgelin Mm.283283 [10.4905 7320.9 86 


Significance was considered as >8 fold and P< 0.015. Data show mean +s.d. for logs transformed expression values (n =3 independent samples per cell population). Maximal background expression 
was considered to be 6.6 (log2(100)); all expression values below this threshold were set to 6.6 for purposes of calculating fold change. Two-tailed Student’s t-tests were used to assess statistical 
significance. Data for bone marrow Scf-GFP* stromal cells are from ref. 19. 
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Epithelial-to-mesenchymal transition 
is not required for lung metastasis but 
contributes to chemoresistance 


KariR. Fischer!?,*-4, Anna Durrans!*, Sharrell Lee!, Jianting Sheng®, Fuhai Li>, Stephen T. C. Wong®®, Hyejin Choi>?+4, 
Tina El Rayes)?*+, Seongho Ryu)’, Juliane Troeger®’, Robert F. Schwabe®’, Linda T. Vahdat!, Nasser K. Altorki!’, 


Vivek Mittal’? & Dingcheng Gao!?? 


The role of epithelial-to- mesenchymal transition (EMT) in metastasis is a longstanding source of debate, largely owing 
to an inability to monitor transient and reversible EMT phenotypes in vivo. Here we establish an EMT lineage-tracing 
system to monitor this process in mice, using a mesenchymal-specific Cre-mediated fluorescent marker switch system 
in spontaneous breast-to-lung metastasis models. We show that within a predominantly epithelial primary tumour, a 
small proportion of tumour cells undergo EMT. Notably, lung metastases mainly consist of non-EMT tumour cells that 
maintain their epithelial phenotype. Inhibiting EMT by overexpressing the microRNA miR-200 does not affect lung 
metastasis development. However, EMT cells significantly contribute to recurrent lung metastasis formation after 
chemotherapy. These cells survived cyclophosphamide treatment owing to reduced proliferation, apoptotic tolerance 
and increased expression of chemoresistance-related genes. Overexpression of miR-200 abrogated this resistance. 
This study suggests the potential of an EMT-targeting strategy, in conjunction with conventional chemotherapies, for 


breast cancer treatment. 


Despite significant advances in diagnosing and treating cancer, metas- 
tasis persists as a barrier to successful therapy and the main cause of 
cancer-related death!. The EMT, wherein epithelial cells depolarize, 
lose their cell-cell contacts, and gain an elongated, fibroblast-like 
morphology, is a potential mechanism by which tumour cells gain 
metastatic features. Functional implications of EMT include enhanced 
mobility, invasion and resistance to apoptotic stimuli??. Moreover, 
through EMT tumour cells acquire cancer stem cell, secondary 
tumour-initiating and chemoresistance properties*°. However, the 
importance of EMT in vivo is fiercely debated owing to major chal- 
lenges. Mesenchymal tumour cells cannot easily be distinguished from 
neighbouring stromal cells, and metastatic lesions mostly exhibit epi- 
thelial phenotypes’. The latter may be due to the hypothesized reverse 
process, mesenchymal to epithelial transition (MET), of the dissem- 
inated tumour cells. Studies have confirmed that mesenchymal cells 
are more capable of escaping the primary tumour, and of reaching 
distant sites, but it remains unproven that those same cells complete 
the full metastatic cascade in the form of a secondary nodule. Without 
evidence for the dissemination, colonization and metastatic outgrowth 
of mesenchymal tumour cells, the role of EMT will remain contested. 
In this study, we employed multiple transgenic mouse models, estab- 
lishing a cell lineage tracing approach together with characterization 
of epithelial and mesenchymal markers, to address the requirement 
of EMT in metastasis. The newly established transgenic model also 
provided us a unique opportunity to study the contribution of EMT 
to chemoresistance. 


EMT lineage tracing during metastasis 

To track EMT during metastasis in vivo, we generated a mesen- 
chymal-specific, Cre-mediated fluorescent marker switch strategy 
and established a triple-transgenic mouse model (MMTV-PyMT/ 
Rosa26-RFP-GFP/Fsp1-cre, tri-PyMT, Fig. 1a). In these mice, spon- 
taneous multifocal breast adenocarcinomas with distinct epithe- 
lial characteristics resembling the human luminal subtype develop 
in the mammary glands, and give rise to lung metastases with high 
penetrance®”. The Fsp1 (fibroblast specific protein 1) promoter drives 
expression of Cre recombinase in cells of mesenchymal lineage’”. 
A Cre-switchable fluorescent marker (lox-RFP-STOP-lox-GFP) is 
ubiquitously expressed under the control of the B-actin promoter 
in the Rosa26 locus!!. Fsp1 is the critical gatekeeping gene of EMT 
initiation”, and its early activation in this process? allows for lin- 
eage tracing of tumour cells that have undergone EMT in vivo. 
Importantly, the colour switch system is irreversible—even if the 
mesenchymal tumour cells undergo MET in the metastatic organs", 
they would remain GFP*. 

Primary breast tumours developed in the tri-PyMT mice at 8 weeks 
of age. Immunofluorescence revealed that the majority of tumour cells, 
identified by PyMT oncogene expression, were RFP positive (Fig. 1b). 
These cells expressed E-cadherin and lacked vimentin (Extended Data 
Fig. 1a), indicating their epithelial phenotype. The GFP* cells detected 
in the tumour bed were largely haematopoietic cells as they are PPMT 
negative and express CD45, a pan-haematopoietic marker (Extended 
Data Fig. 1a), which is consistent with previous reports!® . Altogether, 
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Figure 1 | Establishing an EMT lineage tracing system in triple-transgenic 
mice. a, Schematic of triple-transgenic mice carrying polyoma middle-T 
(PyMT) or Neu oncogenes driven by the MMTV promoter, Cre recombinase 
under the control of the Fsp1 promoter, and floxed RFP-STOP followed 

by GFP under control of the B-actin promoter in the Rosa26 locus. RFP* 
epithelial tumour cells undergoing EMT permanently convert into GEP* 
cells following activation of Fsp1—Cre. b, c, Immunofluorescent microscopy 
images of tri-PyMT primary tumours (b) and lung metastases (met; c) (>10 
sections from 3 mice), depicting RFP* and GFP* cells within the tumour 
bed, and staining (white, pseudo-coloured) for PyMT. Scale bars, 100 um. 


this data suggests that tumour cells maintain their original RFP expres- 
sion and epithelial phenotype in the primary tumour. 

Lung metastasis developed spontaneously in tri-PyMT lungs at 12 
weeks of age. Surprisingly, the PyMT-positive metastatic lesions were 
RFP* (Fig. 1c), and epithelial (E-cadherin */vimentin— ) (Extended Data 
Fig. 1b), whereas only non-tumour cells expressed GFP. These results 
indicate that tumour cells did not activate the mesenchymal-specific 
Fsp1 promoter, and retained their epithelial phenotype during metastasis. 
Thus, tumour cells may not undergo EMT to form metastatic lesions. 


Lineage tracing in additional models 

To exclude the possibility that the absence of EMT in metastasis may be 
unique to PyMT-driven breast tumours, we established EMT lineage 
tracing in the Neu oncogene-driven'® spontaneous breast cancer model 
(MMT V-neu/Rosa26-RFP-GFP/Fsp 1-Cre, tri-Neu mouse). The Neu 
(ErbB-2) proto-oncogene is associated with 20-30% of human breast 
cancers, and MMTV-Neu transgenic mice spontaneously develop 
focal adenocarcinomas resembling human luminal phenotypes after 
an extended latency at 6-8 months of age. Lung metastases are fre- 
quently (72%) observed in these transgenic mice at 9-12 months of age. 
Mirroring the tri-PyMT model, the Neu* tumour cells in both primary 
and metastatic lesions in tri-Neu mice were also RFP* and epithelial 
(E-cad*/Vim_ ) (Extended Data Fig. 2). Therefore, the absence of EMT 
during metastasis formation is an oncogene-independent phenome- 
non, manifesting in both PyMT and Neu-driven tumours. 

To overcome the limitation of using solely Fsp1—Cre to indicate EMT, 
we acquired the vimentin-CreER transgenic mouse, which successfully 
traced mesenchymal lineage cells during liver fibrosis'”, and gener- 
ated an additional EMT lineage tracing model (tri-PyMT/Vim mice, 
MMTV-PyMT/Rosa26-RFP-GFP/Vimentin-creER). After continuous 
induction of Cre activity by Tamoxifen injection (2 mg, intraperito- 
neal, three times per week starting when the primary tumours appear 
at 8 weeks of age) the majority of tumour cells in both the primary 
and metastatic lesions in tri-PyMT/Vim mice were RFPt (Extended 
Data Fig. 3)—suggesting an absence of vimentin promoter activation 
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Figure 2 | The EMT lineage tracing system reports EMT in tumour cells 
with high fidelity. a, Scatter plots from flow cytometry analysis of tri-PyMT 
primary tumour cells, depicting GFP* and RFP* populations in the primary 
tumour immediately after sorting of RFP* cells (P1), and after ten passages 

in culture with 10% FBS (P10 + 10% FBS). Numbers indicate the percentage 
of RFP* and GFP* cells in the total population. b, Phase contrast/fluorescent 
overlay image of tri-PyMT cells in culture. Scale bar, 50 um. c, Western blot of 
sorted RFP* and GFP* tri-PyMT cells for E-cadherin, vimentin and B-actin as 
a loading control. Representative of two individual experiments. For original 
gel images, see Supplementary Fig. 1. d, Representative imaging of GFP* and 
REFP* tumour cells in primary tumours (PT) and lung metastases (LM) in the 
orthotopic model (n= 8 mice). Arrow indicates scattered GFP* EMT tumour 
cells in the primary tumour. Scale bars, 100 um (PT) and 50 um (LM). 

e, (RT-PCR analysis of relative expression of EMT markers in RFP* and GFPT 
cells sorted from orthotopic tri-PyMT primary tumours. Gapdh served as the 
internal control. E-cadherin is encoded by the Cdh1 gene. Occludin is encoded 
by the Ocln gene. Data are reported as mean + s.e.m., n= 4 primary tumours. 


during lung metastasis formation. EMT marker staining also revealed 
the epithelial phenotype (E-cad*/Vim_ ) of the tumour cells in both 
primary and metastatic lesions (Extended Data Fig. 3). 

Together, results from two oncogene-driven metastatic tumour mod- 
els (MMTV-PyMT and MMTV-Neu) and two independent mesen- 
chymal-specific reporters (Vim-Cre and Fsp1-Cre) suggest that EMT 
does not significantly contribute to the development of lung metastases. 


Validating EMT lineage tracing 

To evaluate the specificity and sensitivity of the EMT lineage tracing 
system, we established a cell line from the tri-PyMT breast tumours. In 
culture, RFP* tri-PyMT cells switched their fluorescent marker expres- 
sion to GFP, as indicated by the presence of a RFPt/GFP* double- 
positive transitioning population (Fig. 2a). The cells were cultured in 
10% FBS, and serum is known to be enriched for many EMT promot- 
ing factors including TGFs'*. Moreover, addition of TGF-f 1 in low- 
serum conditions (2% FBS), yielded an increase in GFP* cells 
(Extended Data Fig. 4a). In concert with the fluorescent marker switch, 
tri-PyMT cells changed their morphology from cobblestone-like clus- 
ters of epithelial cells to dispersed spindle-shaped mesenchymal cells 
(Fig. 2b). Reflecting the morphologic differences, the GFP* cells were 
more motile than RFP* cells (Extended Data Fig. 4b). 

The fidelity of the EMT lineage tracing system was confirmed by 
analysis of EMT marker expression in sorted RFP* and GFP* tri- 
PyMT cells. RFP* cells expressed elevated levels of epithelial markers 
including E-cadherin and Occludin, while GFP* cells expressed 
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several mesenchymal markers including vimentin, FSP1, Twist, Zeb1 
and Zeb2 as determined by quantitative reverse transcription PCR 
(qRT-PCR) (Extended Data Fig. 4c). Both RFP* and GFP* tri-PyMT 
cells expressed the PyMT oncogene. Consistently, western blot analysis 
confirmed the differential expression of E-cadherin and vimentin in 
RFP* and GFP* cells (Fig. 2c). Flow cytometry for E-cadherin revealed 
that the majority of E-cadherin” cells were GFP* (97.4%) (Extended 
Data Fig. 4d). Of note, the E-cadherin* cells were either RFP* (93.6%) 
or RFP*/GFP* (6.0%), demonstrating that tumour cells switch their 
fluorescent marker expression before the loss of epithelial markers, and 
validating the early reporting of EMT in our system. These results con- 
firm that the Fsp1-Cre-mediated fluorescent marker switch in tumour 
cells reports EMT with high fidelity and efficiency. 


Rare EMT events in tumour progression 

In the triple-transgenic models, ubiquitous expression of GFP in the 
tumour microevironment precluded detection of potentially rare GFP* 
tumour cells. To confine the fluorescence to tumour cells, we estab- 
lished an orthotopic model by implanting purified RFP* tri-PyMT 
cells in wild-type mice (Extended Data Fig. 5a, b). Consistent with 
observations in the triple-transgenic mice, primary tumours contained 
REP* epithelial cells (Fig. 2d). However, GFP* cells were detected, indi- 
cating tumour EMT (Fig. 2d and Extended Data Fig. 5c, upper panel). 
These cells lacked E-cadherin (Extended Data Fig. 5c, upper panel) 
and made up 1.98 + 1.40% (n =6) of the total tumour cells (Extended 
Data Fig. 5d). qRT-PCR analysis of EMT markers comparing sorted 
REP*t and GFP* cells from the same primary tumour confirmed the 
mesenchymal phenotype of the GFP* cells (Fig. 2e). Importantly, these 
GFP* EMT tumour cells did not contribute to lung metastasis. Early 
disseminated tumour cells detected in the lungs were epithelial and 
RFP* (Extended Data Fig. 5c, middle panel), and 28 lung nodules 
detected in 8 mice maintained the epithelial phenotype (Fig. 2d and 
Extended Data Fig. 5c, lower panel). 

We also established an orthotopic tri-PyMT/Vim model, wherein 
Tamoxifen was administered directly after orthotopic injection to 
ensure immediate tracing of EMT events. Consistently, the majority of 
tumour cells in both primary and metastatic tumours were RFP* and 
epithelial (Extended Data Fig. 6). Again, GFP* EMT events (4.46 + 1.0% 
of total tumour cells, n = 3) were detected in the primary tumours. 

To further dissect the metastatic cascade, we quantified the relative 
numbers of REP* and GFP* cells in the primary tumour, blood and 
metastases of the tri-PyMT orthotopic model by flow cytometry. An 
RFP to GFP ratio of ~100:1 in the primary tumour and ~15:1 in the 
blood was observed (Extended Data Fig. 7a, b). However, gain by the 
enrichment of GFP* cells in circulation did not translate to an advan- 
tage in metastatic outgrowth, as the RFP:GFP ratio in the lung was 
~150:1. Altogether, these findings are consistent with our observa- 
tions in the triple-transgenic models, suggesting that the majority of 
breast tumour cells persist in an epithelial state during primary tumour 
growth and lung metastasis formation. 


EMT inhibition and metastasis formation 

In spite of the extensive characterization of the EMT reporter system, 
there was still the distant possibility of our reporter failing to mani- 
fest all EMT events in vivo. Therefore, we sought to inhibit EMT and 
determine its impact on metastasis. We ectopically expressed miR-200, 
a well-known inhibitor of EMT that directly targets Zeb1 and Zeb2— 
the transcriptional repressors of E-cadherin'®”. We posited that sta- 
bly expressing miR-200 in tri-PyMT cells would block EMT and trap 
tumour cells in a permanent epithelial state. Compared with control 
cells, miR-200 overexpressing cells (Extended Data Fig. 7c) showed 
elevated expression of epithelial cell markers and reduced expression 
of mesenchymal markers (Extended Data Fig. 7d). As expected, over- 
expression of miR-200 inhibited the RFP to GFP conversion (>90% 
remaining RFP‘, Fig. 3a). These results substantiate effective miR-200 
suppression of EMT in the tri-PyMT cells. 
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Figure 3 | mir-200 inhibition of EMT in tri-PyMT cells did not impact 
lung metastasis. a, Flow cytometry analysis of tri-PyMT control and 
mir-200-expressing cells, indicating the percentage of RFP* and GFP* cells. 
b, Representative histologic lung images in tri-PyMT control and mir-200- 
expressing orthotopic mice (n=5). Scale bar, 1.5mm. ¢, Quantification 

of lung metastasis formation (number of individual nodules) in tri-PyMT 
control and mir-200-expressing tumour-bearing mice (n= 5). Data reported 
as the mean + s.e.m. 


To explore the impact of inhibiting EMT on metastasis formation 
in vivo, we orthotopically injected miR-200 overexpressing tri-PyMT 
cells. We identified 18 metastases in 5 mice, a similar ratio to that 
observed in mice bearing control tri-PyMT cells (28 metastases in 
8 mice) (Fig. 3b, c). These results demonstrate that inhibition of EMT 
by miR-200 overexpression does not impair the ability of tumour cells 
to form distant lung metastases. 


EMT is involved in chemoresistance 

Emerging evidence suggests a molecular and phenotypic associ- 
ation between EMT and chemoresistance in several cancers*~”?. 
Compellingly, residual breast cancers following chemotherapy display 
a mesenchymal phenotype and tumour-initiating features”*. To deter- 
mine if the acquisition of chemoresistance induces specific molecular 
changes consistent with EMT, we evaluated the orthotopic tri-PyMT 
model under chemotherapy. Animals with established primary tumours 
were treated with cyclophosphamide (CTX), a commonly used drug in 
breast cancer treatment?* (100 mg kg}, once per week, for two weeks 
prior, and two weeks after, surgery; Fig. 4a). The tumours responded to 
chemotherapy, manifesting a 60% reduction in growth and markedly 
enhanced apoptotic activity (Extended Data Fig. 8a—c). Of note, the 
REP* cells were highly proliferative and apoptotic in comparison with 
GFP * cells in CTX-treated mice (Extended Data Fig. 8d-g), suggesting 
that GFP* cells have reduced susceptibility to chemotherapy. However, 
in the primary tumour, the GFP* cell percentage remained static under 
CTX treatment (Extended Data Fig. 8h). 

Remarkably, in the early metastatic lungs (four weeks after tumour 
inoculation), flow cytometry analysis revealed a 2.7:1 ratio of GFP* to 
RFP* cells in CT'X-treated mice (Fig. 4b). Subsequently at four weeks 
after cessation of treatment, a notable contribution of GFP* tumour 
cells was detected in 5 out of 17 metastatic lesions (Fig. 4a). This is in 
contrast to untreated mice, where all metastatic lesions were derived 
from RFP* cells (Fig. 2d), suggesting that the EMT process may be 
involved in metastatic outgrowth in the context of chemotherapy. 

To evaluate the effects of CTX on the EMT and non-EMT cell 
populations, sorted GFP* and RFP* cells were incubated with CTX 
in vitro—the GFP* cells were markedly more resistant to both short- and 
long-term treatment (Fig. 4c and Extended Data Fig. 9a, b). The selective 
advantage of mesenchymal tumour cells in the context of chemotherapy 


© 2015 Macmillan Publishers Limited. All rights reserved 


Lung metastasis 
(GFP/RFP/DAPI) 


a PT removal 4 


Orthotopic > 1 a weeks 
injection yy weeks 
ee i 


RFP* 
tri-PyMT cells 
b * d 
7) #0 Pre-injection 
8 3.0 wd ons 
7 48.4% | 5 20g ail vein 
t 2.0 a|4 2 ade | injection oo 
eo Ef ej oe ee 
pets os | andi. aw, 
5 | Od ars 
0 . 
Control CTX ad 45.7% A A A 
“370 10%) 10°10" 108 CTX treatment 
GFP 
c e f 
107 m RFP - 20 * 
<= 84 mGFP 3 30 
= - 25 215 
26 3g a." 
S - i 
3 S 45 & 1.0 
2 2 10 j 
< 2 0.5 
o+| = : : S 5 * ) o 
0 é 8 ae o- —— 0 T 1 
CTX (UM) Control CTX Control CTX 


Figure 4 | EMT tumour cells are resistant to chemotherapy. a, Schema 
of CTX treatment in tri-PyMT orthotopic model. Mice bearing an RFP* 
primary tumour were treated with CTX (100 mg kg“, once per week, for 
4 weeks, as indicated by blue arrows). After 2 weeks of treatment, primary 
tumour (PT) was removed (black arrow). Lung metastasis growth was 
permitted for 4 weeks post CTX treatment. Fluorescent imaging of lungs 
revealed the contribution of GFP* tumour cells to lung metastases (n = 9 
mice). b, Ratio of GFPt to RFP* cells in early metastatic lungs (4 weeks 
post orthotopic injection) of untreated control and CTX-treated mice 

as quantified by flow cytometry (n = 4, *P < 0.05). Data reported as the 
mean + s.e.m. ¢, Apoptosis (as measured by Annexin binding) of REP* 
and GFP* tri-PyMT cells treated with CTX (n =2 biological replicates). 
d, Flow cytometry scatter plot showing the proportions of RFP* and 
GFP* tri-PyMT cells before intravenous injection. Mice were treated 
with CTX (100 mg kg! per week for 3 weeks, n=5 mice per group). 

e, Quantification of flow cytometry data showing the percentage of RFP* 
and GFP* tumour cells (red and green bars, respectively) of total cells in 
the lung of control and CTX-treated mice (n=5 mice per group, *P < 0.05). 
f, Quantification of flow cytometry data showing the ratio of GFP to 
REP* cells in lungs of control and CTX-treated mice. Black line represents 
the starting ratio of GEP* to RFP* cells before injection as derived from 
the data in Fig. 4d (*P <0.05). Data reported as the mean + s.e.m. 


was then corroborated by a competitive survival assay in vivo (Fig. 4d). 
Mice were injected intravenously with an equivalent number of RFPT 
and GFP* cells, and immediately received CTX (100 mg kg™', once per 
week). After three weeks, lungs were harvested and the ratio of RFP 
and GFP* cells was assessed by flow cytometry. CTX significantly 
inhibited outgrowth of lung metastasis from both RFP* and GFP* cells 
(Fig. 4e). The untreated lungs were morbidly overwhelmed with 
tumours, with nearly 80% of the tumour cells detected as REP™. 
Conversely, in CT X-treated mice, more than 60% of the surviving 
tumour cells were GFPt, producing a significantly higher ratio of 
GFP:RFP cells in these mice (Fig. 4f). These results indicate that GFP~ 
EMT cells are more resistant to chemotherapy both in vitro and in vivo. 

Immunostaining revealed that in the untreated mice, both RFP~ 
and GFP* cells formed epithelial metastatic lesions (E-cad*/Vim— ) 
(Extended Data Fig. 9c). Given the initial mesenchymal phenotypes 
of GFP* cells before injection, this suggests that the GFP* tumour 
cells have undergone MET in the metastatic organ. On the other hand, 
in CTX-treated mice the majority of surviving tumour cells were 
scattered mesenchymal GFP* cells (E-cad~/Vim*) (Extended Data 
Fig. 9d). Together, these observations suggest that EMT tumour cells 
that sustain a mesenchymal phenotype are resistant to chemotherapy. 

To begin to investigate the molecular underpinnings of mesenchy- 
mal tumour cell resistance, we analysed the transcriptomic changes of 
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Figure 5 | miR-200 overexpression abrogates CTX resistance. 

a, Sensitivity of Control and miR-200-expressing tri-PyMT tumour cells 

to CTX treatment as measured by CellTiter-Glo. n = 4 biological replicates 
per condition b, Representative histologic lung images in tri-PyMT control 
and mir-200-expressing tumour-bearing mice treated with CTX (n=5). 
Scale bar, 1.5mm. c, Quantification of lung metastasis formation (number 
of individual nodules) in CT X-treated tri-PyMT control and mir-200- 
expressing tumour-bearing mice (n= 5). Data reported as the mean + s.e.m. 


EMT tumour cells. We sorted RFP* and GFP? cells and performed 
RNA-sequencing analysis (Supplementary Information Table 1). In 
addition to the expected changes in EMT marker expression (Extended 
Data Fig. 10a), the expression of many cell-proliferation-related 
genes was reduced in GFP* cells (Extended Data Fig. 10b), mirror- 
ing their phenotype of reduced proliferation in vivo. The GFP* cells 
also showed increased expression of proven chemoresistance-related 
factors including IL6, Periostin, Enpp2 and Pdgfr?***. Additionally, the 
CTX-treated GFP* cells elevated their expression of many drug- 
metabolizing enzymes including drug transporters (Abcbla, Abcb1b 
and Abccl), aldehyde dehydrogenases (ALDHs), cytochrome P450s, 
and glutathione-metabolism-related enzymes (Extended Data Fig. 10c). 
The main toxicity of CTX is due to its metabolite phosphoramide mus- 
tard, which is only formed in cells with low levels of ALDHs. ALDH 
converts the CT X-metabolite aldophosphamide into the non-toxic 
carboxyphosphamide”’. In accordance with the transcriptomic data, 
GFP* cells had significantly higher ALDH activity compared with 
RFP* cells (Extended Data Fig. 10d). These properties of reduced pro- 
liferation, increased apoptotic resistance, and upregulation of chemore- 
sistance and drug metabolizing genes in GFP* EMT tumour cells may 
contribute to their insensitivity to CTX. Notably, GFP* cells were also 
refractory to other commonly used chemotherapies including doxoru- 
bicin, paclitaxel, and fluorouracil treatment (Extended Data Fig. 10e). 

To demonstrate that the EMT is required for the generation of CTX 
resistance, we first tested in vitro the effect of treatment on control 
and miR-200 overexpressing tri-PyMT cells. With increasing concen- 
trations of CTX, the miR-200 cells were significantly more suscepti- 
ble to therapy (Fig. 5a). We then expanded upon this finding in vivo, 
establishing orthotopic control and miR-200 primary tumours, and 
applying the pre- and post-surgery CTX regimen. We found that by 
blocking EMT in tumour cells, we effectively ablated metastatic growth 
(Fig. 5b, c). Thus, EMT contributes to the development of chemo- 
resistant metastasis. 


Discussion 

Using two independent EMT lineage tracing strategies in two dis- 
parate oncogene-driven autochthonous models of breast cancer, we 
demonstrated that lung metastases are derived from non-EMT tumour 
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cells, contradicting the original EMT/MET hypothesis**”. In a tracing 
system similar to our own, EMT was identified in primary tumours, 
but the mesenchymal lineage status of the metastatic nodules was 
not pursued*!. Ultimately in our models we found that tumour cells 
disseminate and form metastases while persisting in their epithelial 
phenotype, in accordance with a recent study*”. To underline that 
EMT is not required for metastasis, overexpression of miR-200—a 
microRNA that is incongruously associated with both reduced 
invasion!*”° and increased metastasis**—resulted in combined sup- 
pression of the EMT-promoting transcription factors Snail1/2, Twist, 
Zeb1 and Zeb2, but had no effect on metastasis. Given that both epithe- 
lial and mesenchymal tumour cells have the potential to disseminate, 
it is plausible that the larger fraction of highly proliferative epithelial 
cells outcompete the minor EMT tumour cell population in generating 
macrometastatic lesions. 

Until now, the majority of data connecting EMT with chemore- 
sistance was largely derived from in vitro studies, or clinical prognostic 
data. Here we demonstrate that highly proliferative non-EMT cells are 
sensitive to chemotherapy, and observe the emergence of recurrent 
EMT-derived metastases after treatment. There is a great emphasis 
towards developing EMT-targeting therapies**”*, and our studies sug- 
gest that while EMT blockade may not affect metastasis formation, 
specifically targeting EMT tumour cells will be synergistic with conven- 
tional chemotherapy. Thus, our EMT lineage tracing system provides a 
unique preclinical platform to develop combination therapies that will 
eliminate both populations, and combat chemoresistance. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Animals. Wild-type C57BL/6 and FVB/n mice, and transgenic mice with ACTB- 
tdTomato-eGFP (stock no. 007676), Fsp1—Cre (stock no. 012641), MMT V-PyMT 
(stock no. 002374), and MMTV-Neu (stock no. 002376) were obtained from The 
Jackson Laboratory. The vimentin-CreER mouse was a kind gift from the labo- 
ratory of R. FE Schwabe at Columbia University. CB-17 SCID mice were obtained 
from Charles River Laboratories. All mouse strains obtained were bred in the 
animal facility at Weill Cornell Medical College. All animal work was conducted 
in accordance with a protocol approved by the Institutional Animal Care and Use 
Committee at Weill Cornell Medical College. 

The ACTB-tdTomato-EGFP and Fsp1-Cre mice were bred together 
to obtain double transgenic mice and then bred with MMTV-PyMT or 
MMTV-Neu mice to obtain the tri-PyMT and tri-Neu triple-transgenic mice, 
respectively. Double transgenic male mice carrying ACTB-tdTomato-eGFP 
and MMTV-PyMT were crossed with the vimentin-CreER mice to obtain 
the tri-PyMT/Vim triple-transgenic mice. Genotyping for each transgenic 
line was performed following the standardized protocols as described in the 
website of The Jackson Laboratory. Genotyping for vimentin-CreER was done 
using forward primer 5’-CCCCTTCCTCACTTCTTTCC and reverse primer 
5’-ATGTTTAGCTGGCCCAAATG. 

Tamoxifen injection. To induce vimentin—CreER activity in the tri-PyMT/Vim 
mice, Tamoxifen (Sigma-Aldrich, 2 mg per mouse, dissolved in corn oil) was 
administered through intraperitoneal injections, three times per week starting 
when the primary tumours appear (at 8 weeks of age) and continuing for 6 weeks 
until metastasis developed in the lung. 

Establishing tri-PyMT cell line. The primary tumour of the tri-PyMT mouse 
(12-week-old female) was surgically removed under sterile conditions. Tumour 
tissue was sliced into ~1 mm? blocks and implanted into the fat pad (no. 4 on the 
right side) of CB-17 SCID mice. The secondary tumour was used to establish the 
tri-PyMT cell line, eliminating the contamination of fluorescent positive stromal 
cells in the tumour tissue from tri-PyMT transgenic mice. 

Tumour tissue was minced and digested with an enzyme cocktail (Collagenase 
A, elastase, and DNase I, Roche Applied Science) in HBSS buffer at 37 °C for 
30 min. The cell suspension was strained through a 40-um cell strainer (BD 
Biosciences). Cells were washed with PBS three times and uploaded in the Aria 
III cell sorter (BD Biosciences). The sorted RFP* cells were cultured in DMEM 
supplemented with 10% fetal bovine serum. The PyMT oncogene expression in 
the established cell line was confirmed by RT-PCR (Extended Data Fig. 4c). The 
tumorigenic ability of these cells was confirmed throughout the study. 

To determine EMT induction by TGF-B, cells were cultured for one week in 
DMEM with 2% FBS and 2ngml~! TGF-81 (R&D Systems). The GFP* cell ratio 
was quantified by flow cytometry. 

To generate the miR-200 overexpressing cell line, a pLenti 4.1 Ex miR-200b- 
200a-429 construct”, was obtained from Addgene. To eliminate the contamination 
of fluorescent marker expression in targeted cells, the GFP gene in this construct 
was removed by BstBI/Xbal digestion followed by blunted self-ligation. Lentivirus 
was packaged by co-transfection of the pLenti-miR-200 construct and packag- 
ing plasmids into HEK293T cells. tri-PyMT cells (passage 2) were infected with 
the lentivirus. Infected cells (tri-Py MT miR-200) were selected by culturing with 
puromycin (2 ug ml) for 14 days. A control tri-PyMT cell line was generated by 
infecting cells with lentivirus carrying the puromycin resistance gene, following 
the same procedure in parallel. 

Orthotopic breast tumour model. To establish an orthotopic breast tumour 
model, we first purified RFP‘ cells from passages 10-15 of tri-PyMT cell culture by 
FACS. The purified RFP* tri-PyMT cells (1 x 10° cells with purity >99%, Extended 
Data Fig. 5a) were injected into the mammary fat pad of 8-week-old female CB-17 
SCID mice. The growth of the primary tumour was monitored by external calliper 
measurement once a week. In approximately 4 weeks, the primary tumour was 
surgically removed and the incision was closed with wound clips. The tumour 
size did not exceed 5% of total body weight as permitted in the IACUC protocol. 
Animals were euthanized 4 weeks after primary tumour removal to analyse the 
development of pulmonary metastasis. For animals subjected to chemotherapy, 
Cyclophosphamide (CTX, Sigma-Aldrich, 100 mgkg~') was administered once 
per week, for 2 weeks prior and 2 weeks after surgery. 

Tissue processing, immunofluorescence and microscopy. The harvested primary 
tumours and PBS-perfused lungs bearing metastases were fixed in 4% paraform- 
aldehyde overnight, followed by 30% sucrose for 2 days, and then embedded in 
Tissue-tek O.C.T. embedding compound (Electron Microscopy Sciences). Serial 
sections (10m, at least 10 sections) were prepared for histological analysis by 
haematoxylin and eosin staining, and immunofluorescent staining following 
standardized protocols. 

Primary antibodies used in this study include CD45 (30-F11, BioLegend), 
E-cadherin (DECMA-1, BioLegend), vimentin (sc-7557, Santa Cruz), PyYMT 
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(ab15085, Abcam), Neu (sc-284, Santa Cruz), Ki67 (ab15580, Abcam), and active 
caspase-3 (C92-605, BD Pharmingen). Primary antibodies were directly conjugated 
to Alexa Fluor 647 using an antibody labelling kit (Invitrogen) performed as per 
manufacturer's instructions and purified over BioSpin P30 columns (Bio-Rad). 
GFP* and REP* cells were detected by inherent fluorescence. 

Fluorescent images were obtained using a computerized Zeiss fluorescent 
microscope (Axiovert 200M), fitted with an apotome and an HRM camera. Images 
were analysed using Axiovision 4.6 software (Carl Zeiss). 
Flow cytometry and cell sorting. For the metastatic lungs and primary tumours, 
cell suspensions were prepared by digesting tissues with an enzyme cocktail 
(collagenase A, elastase, and DNase I, Roche Applied Science) in HBSS buffer at 
37°C for 30 min. For cultured cells, cells were collected through trypsinization. A 
single-cell suspension was prepared by filtering through a 30-uum cell strainer (BD 
Biosciences). Then cells were stained following a standard immunostaining protocol. 
In brief, cells were pre-blocked with 2% FBS plus Fc block (CD16/CD32, 1:30, BD 
Biosciences) and then incubated with the primary antibody against E-cadherin 
(DECMA-1, BioLegend). SYTOX Blue (Invitrogen) was added to the staining tube 
in the last 5 min to facilitate the elimination of dead cells. GFP* and RFP* cells 
were detected by their intrinsic signals. The stained samples were analysed using 
the LSRII flow cytometer coupled with FACS Diva software (BD Biosciences). Flow 
cytometry analysis was performed using a variety of controls including isotype anti- 
bodies, unstained and single-colour stained samples for determining appropriate 
gates, voltages and compensations required in multivariate flow cytometry. 

For sorting live cells back for further culturing or injection into animals, we 
used the Aria II cell sorter coupled with FACS Diva software (BD Biosciences). The 
preparation of cells for sorting was performed under sterile conditions. The purity 
of subpopulations after sorting was confirmed by analysing post-sort samples in 
the sorter again. 

Quantitative RT-PCR analysis. Total RNA was extracted by using the RNeasy Kit 
(Qiagen), and miRNA via the mirVana miRNA isolation kit (Life Technologies), 
and converted to cDNA using qScript cDNA SuperMix (Quanta Biosciences) 
and RT-PCR. qPCR was performed with the appropriate primers (sequences 
shown in the table) and iQ™ SYBR Green master mix (Bio-Rad). PCR proto- 
col: initial denaturing at 95°C for 3 min, 40 cycles of 95°C for 20s, 60°C for 30s, 
and 72°C for 30s, followed by final extension at 72°C for 5 min and melt curve 
analysis was applied on a Bio-Rad CFX96 Real Time System (Bio-Rad) coupled 
with Bio-Rad-CFX Manager software. Primers used are as follows: GAPDH, 
forward, 5‘-GGTCCTCAGTGTAGCCCAAG-3’; reverse 5‘-AATGTGTC 
CGTCGTGGATCT-3’; Cdh1 (E-cadherin), forward, 5‘-ACACCGATGGTGAGGG 
TACACAGG-3’; reverse, 5’-GCCGCCACACACAGCATAGTCTC-3’; Ocln, 
forward, 5’-TGCTAAGGCAGTTTTGGCTAAGTCT-3’, reverse, 5/-AAAA 
ACAGTGGTGGGGAACGTG-3’; Vim, forward, 5’-TGACCTCTCTGAGG 
CTGCCAACC-3’; reverse, 5/-TTCCATCTCACGCATCTGGCGCTC-3’; Cdh2 
(N-cadherin), forward, 5’-AAAGAGCGCCAAGCCAAGCAGC-3’; reverse, 
5’-TGCGGATCGGACTGGGTACTGTG-3’; FSP-1, forward, 5’-CCTG 
TCCTGCATTGCCATGAT-3’, reverse, 5’/-CCCACTGGCAAACTACACCC-3’; 
Snail, forward, 5'/-ACTGGTGAGAAGCCATTCTCCT-3’; reverse, 5’-CTGGC 
ACTGGTATCTCTTCACA-3’; Snai2, forward, 5’-TTGCAGACAGATCA 
AACCTGAG-3’; reverse, 5‘-TGTTTATGCAGAAGCGACATTC-3’; Twist1, 
forward, 5/-AGCTACGCCTTCTCCGTCTG-3’; reverse, 5/-CTCCTTCT 
CTGGAAACAATGACA-3’; Zeb-1, forward, 5‘-GATTCCCCAAGTGGC 
ATATACA-3’; reverse, 5’-TGGAGACTCCTTCTGAGCTAGTG-3’; Zeb-2, 
forward, 5‘-TGGATCAGATGAGCTTCCTACC-3’; reverse, 5’/-AGCAA 
GTCTCCCTGAAATCCTT-3’; PyMT, forward, 5‘-ACTGCTACTGCA 
CCCAGACA-3’; reverse, 5/-CTGGAAGCCGGTTCCTCCTA-3’; GFP, 
forward, 5’-CCACATGAAGCAGCACGACT-3’; reverse, 5/-GGGTCTTG 
TAGTTGCCGTCG-3’; RFP, forward, 5‘-AGCGCGTGATGAACTTCGAG-3’; 
reverse, 5/-CCGCGCATCTTCACCTTGTA-3’. 

RNA-sequencing analysis. Total RNA was extracted from sorted RFP* and GFP* 
tri-PyMT cells with the RNeasy Kit (Qiagen). RNA-seq libraries was constructed 
and sequenced following standard protocols (Illumina). Single-end RNA-seq reads 
were mapped to UCSC mouse genome (GRCm38/mm10) using Tophat2. FPKM 
values for each gene were estimated by Cufflinks and statistical analysis was done 
using Cuffdiff2. Heat maps for differentially expressed genes with adjusted P values 
<0.05 were drawn using gplots R package. 

Western blot analysis. Cells were homogenized in 1 x RIPA lysis buffer (Millipore) 
with protease inhibitors (Roche Applied Science). Samples were boiled in 1x 
Laemmli buffer and 10% 6-mercaptoethanol, and loaded onto 12% gradient Tris- 
glycine gels (Bio-Rad). Western blotting was performed using antibodies specific 
for E-cadherin (clone DECMA-1), vimentin (clone RV202, BD Pharmingen), and 
B-actin (clone AC-15, Sigma-Aldrich). 

Cell apoptosis and viability assays. To determine apoptosis of RFP* and 
GFP* cells, tri-PyMT cells (Passage 10) were seeded on adherent six-well plates 
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(1 x 10° cells), and treated with 4-hydroperoxy cyclophosphamide (Santa Cruz) 
for 48 h. After treatment, cells were trypsinized and stained with APC-conjugated 
Annexin V (BD Biosciences) and SYTOX Blue (Invitrogen) for apoptotic-cell 
labelling. The stained cells were analysed in the LSRII flow cytometer to quantify 
the percentage of apoptotic, dead, and live RFP* and GFP* cells by FACS Diva 
software. To determine the viability of tri-PyMT control and miR-200-expressing 
cells treated with CTX, cells were plated in 96-well adherent black-walled plates 
(1 x 10‘ cells), and treated with 4-hydroperoxy cyclophosphamide for 48h. After 
treatment, cell viability was measured with the CellTiter-Glo Luminescent Cell 
Viability Assay (Promega). 

Cell migration assay. 1 x 10° tri-PyMT cells were seeded in a six-well plate. 
Real-time images of cells (including phase, GFP and RFP channels) were taken 
under a computerized Zeiss microscope (Axiovert observation) every 10 min for 
10h. Movement of individual cells (>10 RFP* and >10 GFP* cells in each field, 
>2 fields were analysed) were tracked with Image] software, and the distance that 
was travelled during that time was measured as indicated. 


ALDH activity assay. RFP* and GFP* tri-PyMT cells (1 x 10° cells each) were 
freshly sorted from culture by FACS and then homogenized in cold ALDH Assay 
buffer provided in the ALDH Activity Colorimetric Assay Kit (Biovision Inc.) 
Following the protocol, ALDH substrate and acetaldehyde were added. ALDH 
activities in samples were measured by OD at 450 nm in kinetic mode (every 3 min 
for 60 min). 

Statistical analysis. To determine the sample size of animal experiments, we used 


difference in means 


power analysis assuming > 2.5. Therefore, all animal experiments 


‘standard deviation | 
were conducted with >5 mice per group to ensure adequate power between groups 
by two-sample t-test comparison. Animals were randomized within each experi- 
mental group. No blinding was applied in performing experiments. Results are 
expressed as mean + s.e.m. Data distribution in groups and significance between 
different treatment groups was analysed by using the Mann-Whitney U-test in 
GraphPad Prism software. P values <0.05 were considered significant. Error bars 
depict s.e.m., except where indicated otherwise. 
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Extended Data Figure 1 | Characterization of the primary tumour and pseudo-colour. Representative images are shown (n > 5 mice). Note the 
lung metastasis of tri-PyMT mice. a, b, Sections of primary tumours co-localization of PyMT with RFP, and CD45 with GFP (as indicated by 
(a) and lungs (b) from tri-PyMT mice were immunostained for E-cadherin _ arrows), in both primary tumours and lung metastases. 

(E-cad, top), vimentin (Vim, middle) and CD45 (bottom) in white 
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Tri-Neu mouse (MMTV-Neu/FSP1-Cre/Rosa-RGFP) 
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Extended Data Figure 2 | Characterization of the primary tumour and Neu, E-cadherin, and vimentin (in white pseudo-colour). Representative 
lung metastasis of tri-Neu mice. Sections of primary tumours (left panel) —_ images are shown (n > 5 mice). Note that both primary tumours and lung 
and lungs (right panel) from tri-Neu mice were immunostained for CD45, _ metastases are largely composed of epithelial RFP* tumour cells. 
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Tri-PyMT/Vim mouse (MMTV-PyMT/Vim-creER/Rosa-RGFP) 
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Extended Data Figure 3 | Characterization of the primary tumour immunostained for PyMT, E-cadherin and vimentin (in white pseudo- 
and lung metastasis of tri-PyMT/Vim mice. Tri-PyMT/Vim mice were colour). Representative images are shown (n > 5 mice). Note that both 
obtained by crossing MMTV-PyMT, vimentin-Cre and Rosa26-RFP-GFP primary tumours and lung metastases are largely composed of epithelial 
transgenic mice. a, b, Sections of primary tumours (a) and lungs (b) were REP* tumour cells. 
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Extended Data Figure 4 | Characterization of tri-PyMT cells. a, EMT 

of tri-PyMT cells with TGF. RFP* tri-PyMT cells were sorted by flow 
cytometry and cultured in medium containing 2% FBS with or without 
TGF-f 1 (2ng ml’) for 3 days. Plot shows quantification of the percentage 
of GFP* cells analysed by flow cytometry (n = 2 biological replicates). 

b, Cell migration assay of tri-PyMT cells. The tracing plots show the 
movement of individual RFP* and GFP* cells in 10h of live imaging. 
Quantification plot (right panel) showed the average distance that RFP* 
and GFP* cells have moved during the time frame (n > 20, *P< 0.01). 

c, Relative expression of epithelial, mesenchymal and tumour markers in 


sorted RFP* and GFP* tri-PyMT cells as determined by qRT-PCR with 
Gapdh as the internal control. n = 2 individual experiments. 

d, EMT of tri-PyMT cells is reported by fluorescent marker switch. Flow 
cytometry plot shows E-cadherin™ (E-cad~) and E-cadherin* (E-cad*) 
subpopulations of tri-PyMT cells (upper panel). Of the E-cad~ and 
E-cad* subsets, the populations were further dissected according to innate 
fluorescence (lower panel). Numbers indicate the percentage of GFP*, 
RFP‘, or transitioning (Q2) cells in the parental E-cad~ or E-cad* subsets, 
respectively. 
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Extended Data Figure 5 | Establishing an orthotopic model with sorted 
RFP* tri-PyMT cells. a, Flow cytometry plots show tri-PyMT cells before 
and after sorting for RFP* cells. Numbers indicate the percentage and 
purity of RFP* cells used for establishing orthotopic breast tumours in 
mice. b, Schematic of the orthotopic breast tumour model with sorted 
RFP* tri-PyMT cells. Cells are injected into the mammary gland of 
wild-type mice to generate primary breast tumours, resection of primary 
tumour at 4 weeks and lung metastases evaluation in another 4 weeks. 

c, Characterization of tumour cells in the primary tumour, disseminated 


tumour cells (DTCs) and tumour cells in the lung metastasis of the 
tri-PyMT orthotopic model. Sections of primary tumours and lungs 
from tri-PyMT orthotopic mice were immunostained for E-cadherin and 
vimentin (in white pseudo-colour). Essentially all RFP* tumour cells 

are detected as E-cad*/Vim_, while the scattered GEP* tumour cells 

in the primary tumour are E-cad~/Vim* (as indicated by arrows in the 
top panel). Representative images are shown (n = 8). d, Plot shows the 
percentage of GFP* cells out of total tumour cells (GFP* plus 

RFPt, n=6). 
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Extended Data Figure 6 | Characterization of EMT status of 
orthotopic tri- Vim-PyMT primary tumours. a, b, Sections of 

tri- Vim-PyMT orthotopic primary tumours (a) and metastatic lung 
(b) were immunostained for E-cadherin and vimentin (in white 


pseudo-colour). As expected, RFP* tumour cells are entirely E-cadherin- 
positive and vimentin-negative, GFP* tumour cells are vimentin-positive 
and E-cadherin-negative, and lung metastases are epithelial and RFP*. 
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Extended Data Figure 7 | Dissemination of tri-PyMT cells in vivo. 

a, Disseminated tumour cells are RFP* and epithelial. RFP* tri-PyMT cells 
were injected into the fat pad of mice. The fluorescence of the primary 
tumour, circulating tumour cells in the blood and disseminated tumour 
cells in the lung were analysed by flow cytometry. The flow cytometry 
plots depicted are the enumeration of RFP*+ and GFP* cells. b, The ratios 
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of detected RFP* versus GFP* cells are shown in the chart (n=4 mice). 

c, Relative expression of miR-200-family microRNAs in tri-PyMT control 
and miR-200-expressing cells. n = 2 individual experiments. d, Relative 
expression of EMT markers and tumour markers in tri-PyMT control and 
mir-200-expressing cells as determined by qRT-PCR with Gapdh as the 
internal control, n =2 individual experiments. 
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Extended Data Figure 8 | Effects of CTX therapy on primary tumours. 
a, Quantification of primary tumour growth after 2 weeks of CTX therapy. 
For tumour growth data see accompanying Source Data. b, Proliferation 
status of primary tumour cells as detected by Ki67 staining in control 
mice and after 2 weeks of CTX therapy. c, Level of apoptosis in primary 
tumours as detected by active caspase-3 staining in control mice and 

after 2 weeks of CTX therapy. d, e, Representative images of Ki67 (d) and 
active caspase-3 staining (e) (white pseudo-colour) of primary tumours 


in control mice and CTX-treated mice. Scale bars, 50 um. f, Proliferation 
status of REP* and GFP* primary tumour cells as detected by Ki67 
staining in control and CTX-treated mice. g, Level of apoptosis in RFP* 
and GFP* primary tumours as detected by active caspase-3 staining in 
control and CTX-treated mice. h, Percentage of GFP* tumour cells in 
control and CTX-treated primary tumours. n = 3 mice for all figures 
described above. Quantification performed using Image] software. 
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Extended Data Figure 9 | EMT tumour cells are resistant to CTX 
treatment both in vitro and in vivo. a, b, Long-term CTX treatment 
in vitro results in a GFP* population. Tri-PyMT cells were subjected to 
2 weeks cyclophosphamide (+ CTX) treatment (41M). Fluorescent 
imaging (a) and flow cytometry (quantified, b, n =3) exhibit the 
percentage of GFP* cells in the CTX-treated culture compared to 
untreated control cells c, d, EMT status of lung nodules in competitive 
survival assay. Representative fluorescent images of tri-PyMT lung 
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metastases in untreated control lungs (c) and CTX-treated lungs (d), 
depicting RFP* and GFP* tumour cells. Immunostaining showing 
E-cadherin (E-cad) or vimentin (Vim) in white pseudo-colour. White 
arrow indicates GFP* tumour cells with epithelial phenotypes (E-cad*/ 
Vim_ ), while the yellow arrow indicates GFP* cells with mesenchymal 
phenotypes (E-cad~/Vim™). Nuclei were counter-stained with DAPI. 
n=): 
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Extended Data Figure 10 | Gene expression profile analysis of RFP* and 
GFP* tri-PyMT cells. RFP* and GFP* tri-PyMT cells were sorted by flow 
cytometry and subjected to transcriptomic analysis by RNA-sequencing. 

a, Heat map of differentially expressed genes (adjusted P < 0.05) from 
RNA-seq of sorted RFP* and GFP* tri-PyMT cells, biologically duplicated. 
Genes that are established epithelial markers (Group 1) include Cdh1 
(which encodes E-cad), Dsp, Epcam, Fgfbp1, Krt18, Krt19, Ocln, Tjp3, 
Krt14 and Tjp2; the mesenchymal markers (Group 2) include Cdh2 (which 
encodes N-cad), Col23a1, Col3a1, Col5a1, Col6a2, Fsp1, Mmp3, Wnt5a and 
Zeb1.b, Cell cycle (left panel) and chemoresistance-related (right panel) 
genes alternatively regulated in RFP* and GFP* cells. c, GFP* tri-PyMT 
cells were also sorted from CTX-treated (41M) samples. Interestingly, a 
branch of genes related to drug metabolism were significantly elevated in 
CTX-treated GFP* cells. Group 1 genes are drug transporters including 
Abcb1la, Abcb1b and Abcc1. Group 2 genes are phase I drug-metabolizing 
enzymes including Adh7, Aldh1a1, Aldh1a3, Aldhil1, Aldh112, Aldh2, 


Cont 


Dox 


Taxol 5FU 


Aldh3a1, Aldh3a2, Aldh3b2, Aldh4al, Cyp1lal, Cyp2f2, Cyp2j6, Ptgs1 and 
Ptgs2. Group 3 genes are phase II drug metabolizing enzymes including 
Aox1, Blvrb, Ces2e, Ces2f, Ces2g, Chst1, Ephx1, Fmo1, Gpx2, Gsta3, Gsta4, 
Gstm2, Gstol, Gstp1, Gstt3, Maoa, Mgst1, Mgst2, Nat6, Nat9, Nqo1, Pon3, 
Ugtla6a and Ugtia7c. d, Aldehyde dehydrogenase (ALDH) activity assay. 
Cell lysates were prepared from flow cytometry-sorted RFP* and GFP* 
tri-PyMT cells. ALDH activity in samples was measured by OD at 450nm 
in a kinetic mode (every 3 min for 60 min). Representative result from 

two independent experiments depicted. e, EMT tumour cells (GFP* cells) 
showed resistance to multiple commonly used chemotherapies. Tri-PyMT 
cells were subjected to treatment with CTX (8 uM), doxorubicin (Dox, 
2M), paclitaxel (Taxol, 10 uM) and fluorouracil (5FU; 1.6 uM) for 3 days. 
Flow cytometry analysis of apoptotic cells was performed after Annexin 
staining. The percentage of dead cells (Annexin*) in RFP* and GFP* cells, 
respectively, was quantified. n = 2 biological replicates. 
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Allosteric ligands for the pharmacologically 


dark receptors GPR68 and GPR65 


Xi-Ping Huang! *, Joel Karpiak**, Wesley K. Kroeze!*, Hu Zhu!+, Xin Chen*>+, Sheryl S. Moy®, Kara A. Saddoris°+, 
Viktoriya D. Nikolova®, Martilias S. Farrell'+, Sheng Wang!, Thomas J. Mangano!, Deepak A. Deshpande’, Alice Jiang!?*, 
Raymond B. Penn’, Jian Jin*+-°+, Beverly H. Koller®, Terry Kenakin!, Brian K. Shoichet? & Bryan L. Roth!?5 


At least 120 non-olfactory G-protein-coupled receptors in the human genome are ‘orphans’ for which endogenous 
ligands are unknown, and many have no selective ligands, hindering the determination of their biological functions 
and clinical relevance. Among these is GPR68, a proton receptor that lacks small molecule modulators for probing its 
biology. Using yeast-based screens against GPR68, here we identify the benzodiazepine drug lorazepam as a non-selective 
GPR68 positive allosteric modulator. More than 3,000 GPR68 homology models were refined to recognize lorazepam ina 
putative allosteric site. Docking 3.1 million molecules predicted new GPR68 modulators, many of which were confirmed 
in functional assays. One potent GPR68 modulator, ogerin, suppressed recall in fear conditioning in wild-type but 
not in GPR68-knockout mice. The same approach led to the discovery of allosteric agonists and negative allosteric 
modulators for GPR65. Combining physical and structure-based screening may be broadly useful for ligand discovery 


for understudied and orphan GPCRs. 


G-protein-coupled receptors (GPCRs)—the largest family of proteins 
encoded in the human genome—transduce signals for the most diverse 
endogenous ligands of any receptor family. Correspondingly, GPCRs 
are the most productive drug targets, with over 26% of US Food and 
Drug Administration (FDA)-approved drugs acting primarily through 
them. Astonishingly, of the 356 non-olfactory GPCRs, about 38% are 
understudied or ‘orphan’ receptors whose physiological roles, and 
often endogenous ligands, remain unknown’. Given the central role 
of GPCRs in physiology and disease, and the high conservation of 
orphan GPCRs among organisms from worms to humans, under- 
studied and orphan GPCRs are probably functionally and therapeuti- 
cally important. Indeed, for the few GPCRs deorphanized since 2003 
(refs 1, 2 and http://www.guidetopharmacology.org/GRAC/Family 
DisplayForward?familyld=16), most have newly approved and investi- 
gational drugs’. As with kinases’, epigenetic proteins” and proteases’, 
ligands specific for orphan GPCRs will illuminate their biology and 
provide new areas for therapeutic intervention. 

A key impediment to GPCR deorphanization is uncertainty about 
the proteins through which they signal, making functional assays 
problematic’. This difficulty is increased by the diverse ligands that 
GPCRs recognize, which range from protons and photons, small 
neurotransmitters and lipids, to peptides and folded proteins. Thus, 
generic functional screens are difficult for orphan GPCRs—one 
neither knows what class of compounds to screen, nor how to screen 
for it, much less how to demonstrate relevance—thereby explain- 
ing the slow progress in determining their roles in signalling and 


physiology’. 


GPR68 (also known as OGR1) exemplifies both the important roles 
these understudied and orphan receptors are thought to serve, and 
our difficulties in illuminating them. Together with GPR4, GPR65 
and GPR132, GPR68 belongs to a family of proton-sensing GPCRs’. 
GPR68 couples to several signalling pathways through Gg, Gs, Gy2/13 
or Gijo proteins’-'°. GPR68 is expressed in many tissues and has 
been implicated in many processes'!~'®, but it is most abundant in 
mouse cerebellum!’ and hippocampus!! (http://www.brain-map. 
org/), suggesting yet to be identified roles in brain function. In acidic 
microenvironments, GPR68 seems to regulate inflammatory pro- 
cesses in airway smooth muscle and other cells'*”°. Surprisingly, 
studies with GPR68-knockout mice uncovered only modest changes 
in these functions!®?!”, Although GPR68 has been reported to be 
activated by a family of isoxazoles'°, their weak activity seems to 
be nonspecific?** and could not be reproduced (see later). Thus, 
although GPR68 may have many roles, few of them are well- 
characterized by knockout and none is known in the central nervous 
system (CNS), where it is most highly expressed. Like other targets 
lacking small molecule reagents, GPR68 remains ‘pharmacological 
dark matter’. 

Here we describe an integrated experimental and computational 
approach to discover ligands that modulate GPR68. A lead compound 
that functions as a positive allosteric modulator (PAM) is demon- 
strated in vitro and in vivo, providing insights into GPR68 physiology. 
Application of the same approach found allosteric agonists and nega- 
tive allosteric modulators for a second understudied GPCR, GPR65, 
suggesting that the approach may be broadly useful. 
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Figure 1 | Lorazepam is a GPR68-positive allosteric modulator. 

a, A library of approved drugs (10 uM) screened with yeast expressing 
chimaeric G,, Gg or GPR68 and chimaeric G, (GPR68,) revealed lorazepam 
as a true and toremifine as a false positive. b, Concentration-dependent 
stimulation of GPR68 G,-yeast growth by lorazepam and analogues. 

c, Structures of representative benzodiazepines (arrows denote methyl 
substituents that reduce GPR68 activity). d, Lorazepam is a GPR68-positive 
allosteric modulator for the agonist proton in the GPR68-mediated cAMP 
production. RLU, relative luminescence units. Data are mean + s.e.m. of 
normalized results (a, b, d, n = 3) and concentration-response curves 

(b, d) were fit via a four-parameter logistic function (see Methods). 


Yeast -based screen reveals GPR68 active compounds 
In an initial campaign with 24 selected orphan and understudied 
GPCRs, we modified a yeast assay system and screened a small 
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Figure 2 | Virtual screening workflow and predicted location of 
GPR68 allosteric site. a, Sequence alignment of GPR68, GPR4, GPR65 
and GPR132 to CXCR4 (details in Extended Data Fig. 2e). b, Docking 

of lorazepam and NCC library to five distinct binding sites (details in 
Extended Data Fig. 2f). c, Models evaluated by their favourable ranking 
of lorazepam versus decoy molecules. d, Optimizing the most favourable 
lorazepam binding mode. e, Optimized lorazepam orientation (grey stick) 
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library of approved drugs (http://www.nihclinicalcollection.com/ and 
Supplementary Fig. 1). We confirmed the known activity of short- 
chain carboxylic acids on the GPR41 and the GPR43 free fatty acid 
receptors (Extended Data Fig. la—d), and that of zinc (Extended Data 
Fig. le) and several other metals (Extended Data Fig. 1f-k) at GPR39. 
The most notable result was the finding that the benzodiazepine 
anxiolytic lorazepam was an agonist at GPR68 (Fig. 1). 

Lorazepam activated GPR68 signalling, stimulating yeast growth by 
more than twofold (Fig. 1a). N-unsubstituted benzodiazepines were 
more efficacious than N-substituted benzodiazepines (Fig. 1b, c and 
Supplementary Table 1) and activated the receptor at both pH 6.5 and 
7.4 (Extended Data Fig. 11), with lorazepam most potently shifting 
the H* concentration-response profile (Fig. 1d and Extended Data 
Fig. 1m-p). The pH-dependence of lorazepam activity suggested that 
it functions as a PAM of GPR68; lorazepam did not affect the activity 
of the related receptors GPR4 or GPR65 (Extended Data Fig. 2a, b). 
When profiled against a panel of CNS targets, lorazepam had substan- 
tial activity only at the GABA, (y-aminobutyric acid type A) receptor, 
its therapeutic target (Extended Data Fig. 3). 


Modelling the GPR68-lorazepam complex 

Little improvement in activity or selectivity was achieved by testing 
lorazepam analogues. This observation, and the potent GABA, 
receptor activity of the drug, led us to seek specific, optimizable mol- 
ecules from computational docking screens of multi-million molecule 
libraries (Fig. 2). 

We generated 407 homology 3D models for GPR68 templated on 
the CXCR4 structure (29% sequence identity, Extended Data Fig. 2f), 
and these were expanded by another 2,900 models using elastic net- 
work modelling, which sampled backbone and loop conformations. 
Against each of the 3,307 models, we computationally docked the 
active benzodiazepines, more than 440 inactive compounds from the 
National Clinical Collection (NCC; http://nihsmr.evotec.com/evotec/ 
sets) library, and 176 property-matched decoy molecules”*. In each 
model, five candidate allosteric sites were docked against (Extended 
Data Fig. 2g), based on the binding regions of aminergic GPCRs, the 
peptide and antagonist sites of CXCR4, and the muscarinic receptor 
allosteric site. Iterative cycles of modelling and optimization (Fig. 2b-e) 
attempted to capture two aspects of ligand binding. First, the activity of 
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in GPR68 (cyan ribbon) and M) muscarinic receptor (salmon ribbon; 
Protein Data Bank (PDB) code 4MQT) with allosteric site (grey) and 
orthosteric site (quinuclidinyl benzilate, magenta). f, Lorazepam in its 
predicted orientation and interactions. g, Virtual screen of ZINC subset 
(~3.1 million molecules) to identify predicted hits. h, ZINC67740571 
(magenta stick) in its predicted orientation and interactions in GPR68. 
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Figure 3 | Identification, characterization and optimization of GPR68- 
positive allosteric modulators. a, Normalized results of GPR68-mediated 
cAMP production for selected compounds (ZINC database numbers) are 
shown; data represent mean + s.e.m. (n= 4-34 measurements) at 10 uM 
for pH 7.40 and 6.50. Compounds were grouped into a first batch from the 
first round of virtual docking, and a second batch from the second round 
of docking. Compounds labelled Isx are isoxazole analogues. 

Lead compounds ZINC32587282, ZINC4929116, ZINC67740571 (ogerin), 


the benzodiazepines as PAMs, and second, the role of histidine residues 
17, 84, 169 and 269, which are thought to interact with one another in 
the inactive state, and move apart on protonation at lower pH values’. 
This cycle converged to a stable lorazepam docking pose (Fig. 2f), and 
to its ranking first among the 622 decoy molecules. This strategy resem- 
bles previous ligand-guided docking”’~*’, although here the binding site 
was unknown. In its docked geometry, lorazepam hydrogen bonds with 
Glu160, Arg189, Tyr244 and Tyr268, and forms non-polar contacts 
with Trp77, Leul01, Phe173, His269 and Leu272 (Fig. 2f). 

To test the modelled lorazepam site, we mutated the Glu160, Arg189 
and His269 residues lining the site (Fig. 2fand Extended Data Fig. 2e, f), 
and determined their roles in proton-mediated cAMP production and 
calcium release (Extended Data Fig. 4). The His269Phe mutant right- 
shifted proton concentration-response curves in both assays’, while sub- 
stitutions at Arg189 selectively abolished cAMP production. Different 
substitutions at Glu160 had varying effects at downstream signalling 
pathways—Glu160Ala left-shifted the proton concentration-response 
curve and reduced cAMP production, but was inactive in calcium 
release, while the Glul60Lys and Glu160GIn mutations had modest 
effects in both pathways (the mutants had little effect on expression, 
Extended Data Fig. 4c). These substantial and differential effects on 
downstream coupling support a role for these residues in the functions of 
GPR68, and are consistent with the modelled binding site for lorazepam. 


its isomer (ZINC32547799) and analogues (C2, C3 and C4) with different 
lengths of linkers, are highlighted. b-e, Concentration—-response curves 
of normalized data (mean + s.e.m.; n = 4) for ogerin (b), C2 (c), C3 (d) 
and C4 (e) are shown to illustrate the allosteric potentiation of proton 
and analysed using a standard operational allosteric model. Allosteric 
parameters are summarized in Supplementary Table 8, and curve-fitting 
details are in Methods. 


Seeking optimized PAMs, we computationally docked 3.1 million 
available lead-like molecules against the putative lorazepam site in 
GPR68. Overall, more than 3.3 trillion complexes were calculated and 
scored. From among the top 0.1% of the docking-ranked molecules, 17 
were purchased for testing; along with their high docking ranks, these 
compounds recapitulated key interactions made by lorazepam in its 
docked model, were chemically diverse and had high-scoring analogues 
(Supplementary Table 2). 

Four of the docking hits increased cAMP production by about 
1.5-fold over basal at pH 6.5 (Fig. 3a). Although none was as active as 
lorazepam, two compounds, ZINC4929116 and ZINC32587282, had 
hundreds of available analogues. These were docked against the GPR68 
model, and 25 were chosen for testing (Fig. 3a and Supplementary 
Table 3). Thirteen had greater activity than lorazepam, and their 
pH-dependent potentiation activity clearly indicates allostery. Although 
dissimilar, lorazepam and ZINC67740571 dock to form many of the 
same interactions, with the addition of a new predicted hydrogen-bond 
to Glu160 from the hydroxyl of ZINC67740571 (Fig. 2f, h and Extended 
Data Fig. 2h). 


Ogerin as a selective GPR68 PAM 
Ten selected compounds were studied further in functional assays. 


According to the standard allosteric operational model”, all were 
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Figure 4 | Ogerin modulates signalling and memory. a, b, Ogerin and 
ZINC32547799 (10 uM) modulate proton-mediated cAMP production 

(a, n=4) and calcium mobilization (b, n=5). Data ina and b are 

mean + s.e.m. RFU, relative fluorescence units. c, d, Ogerin but not its 
isomer (ZINC32547799) decreased contextual memory retrieval in wild- 
type (WT; n=7) but not GPR68-knockout (KO; n= 8) C57BL/6J male mice 
(c, Fa.27)=4.71, P< 0.05 for drug x genotype effect, P< 0.05 for ogerin at 
wild-type mice, two-way analysis of variance (ANOVA), Bonferroni’s 
post-hoc test); both had no effect on cued memory retrieval in either 
wild-type (n =6) or knockout (n =7) C57BL/6] male mice (d). Results 

(c, d) were normalized to vehicle control; see also Extended Data Fig. 8d-i. 


GPR68 PAMs, lacking intrinsic activity but increasing agonist potency 
(a-factor) for cAMP production by 1.9-8.2-fold, and increasing efficacy 
(B-factor) by 1.1-5.6-fold (Supplementary Table 5). It is this ability to 
shift concentration-response curves leftward and upward (Extended 
Data Fig. 4b) that are the key characteristics of a PAM. ZINC67740571 
had a much higher allosteric effect than lorazepam (Fig. 3b versus 
Fig. 1d, and Supplementary Table 8); we denoted it ‘ogerim (for OGR1 
ligand). 

Ogerin and ZINC32547799 are close analogues (Fig. 3a), but 
each had distinct functional activities (Fig. 4a and Extended Data 
Fig. 4f, g) and docking poses (Fig. 2 and Extended Data Fig. 2h). Thus, 
the ortho-hydroxylmethyl group, which differentiates them, may have 
a key role in determining PAM activity, perhaps because of its ability 
to hydrogen-bond with Glu160, which the meta-positioned hydroxyl- 
methyl in ZINC32547799 cannot reach. The structure-guided mutants 
His269Phe and Arg189Leu responded to ogerin and ZINC32547799 
differently (Fig. 4a, Extended Data Fig. 4f, g and Supplementary 
Table 6), supporting the modelled interactions with these residues. 
Notably, rather than activating, ogerin inhibited proton-mediated 
calcium release—a pathway-specific function rescued in Arg189Leu 
and His269Phe (Fig. 4b, Extended Data Fig. 4h, i and Supplementary 
Table 7). Meanwhile, ZINC32547799 had little effect on calcium 
release. To determine whether fast kinetics affect the difference 
between cAMP measurement (under equilibrium) and calcium 
release (non-equilibrium), we also conducted phosphatidylinositol 
hydrolysis assays under equilibrium. Ogerin slightly potentiated pro- 
ton activity here (Extended Data Fig. 4j, k), whereas ZINC32547799 
did not. Furthermore, ogerin had minimal PAM activity at the related 
proton-sensing GPCRs, GPR4 and GPR65 (Extended Data Fig. 2c, d). 
Ogerin seems to be a functionally selective GPR68 PAM for the agonist 
proton. 

If the ogerin-GPR68 model is relevant, we should be able to leverage 
it for optimization. We designed a virtual library of more than 600 
ogerin analogues and docked each into the GPR68 model (Extended 
Data Fig. 2h, i). Thirteen high-scoring analogues were synthesized, 
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and three were more active than ogerin (Supplementary Table 9 and 
Extended Data Fig. 6), including the first and seventh ranked com- 
pounds, the latter of which, C2, had the greatest allosteric effect, shift- 
ing the proton response threefold further to the left than does ogerin, 
for an a-factor of 22 (Fig. 3a—c and Supplementary Table 8). C2 differs 
from ogerin by the addition of a methylene to the benzylamine side 
chain, which places the phenyl ring deeper into a modelled apolar 
pocket (Extended Data Fig. 2i). The addition of one or two further 
methylenes in compounds C3 and C4 (Fig. 3a), conversely, reduced 
allostery (Supplementary Table 8 and Extended Data Fig. 6f), consistent 
with reduced complementarity to the apolar pocket in the modelled 
complex. 

To investigate ogerin specificity for GPR68 over unrelated targets, 
which might affect its usefulness as a biological probe, we first com- 
putationally screened ogerin and its analogues for off-targets using 
the Similarity Ensemble Approach (SEA) program*! against a panel of 
2,800 targets. These calculations revealed similarity between the GPR68 
ligands and those of only three other GPCRs: the ghrelin and adenosine 
A, and Ajaq receptors. Subsequent physical profiling against 58 GPCRs, 
ion channels and transporters (Extended Data Fig. 3) revealed that 
ogerin had moderate affinity at two GPCRs, 5-hydroxytryptamine 2B 
(5-HT2,) and the Az, receptor (Extended Data Fig. 5h, i), the latter 
consistent with the SEA prediction. 

Intrigued by the association between the GPR68 PAMs and aden- 
osine receptor antagonists, we computationally screened a library 
(http://www.tocris.com/dispprod.php?ItemId=5386#.U_s5ZMVdUrU) 
of 1,120 reagents and drugs against the GPR68 ligands, again using 
SEA. SLV320, a selective adenosine A, antagonist*?, was predicted to 
be a GPR68 PAM and confirmed by a physical screen of the full library 
(SLV320 af =2.8) (Extended Data Fig. 7 and Supplementary Table 8), as 
was a second adenosine receptor antagonist, CGH2466 (af = 2.9), and 
tracazolate (aB = 3.4), a GABAergic (GABA-mediated) drug that also 
antagonizes adenosine receptors*’. Although CGH2466 has the lowest 
apparent binding constant (Kg) ofany GPR68 PAM (48 nM), its allostery 
is much lower than that of ogerin; additionally, like SLV320 and traca- 
zolate, CGH2466 is a potent phosphodieseterase inhibitor (Extended 
Data Fig. 7) and had minimal activity in the presence of Ro 20-1724. 
This previously unknown cross-talk among the GPR68, adenosine and 
GABA receptor ligands (Extended Data Fig. 7d), along with their activ- 
ities at phosphodiesterases, should be considered when evaluating the 
pharmacology of what have been considered specific probes and drugs. 


Ogerin as a GPR68 probe 

Given its activity and specificity, we sought to explore the downstream 
signalling and in vivo activity of ogerin. In GPR68-expressing HEK293 
cells, we found that both ogerin and lorazepam activate the protein 
kinase A (PKA) and mitogen-activated protein (MAP) kinase path- 
ways (Extended Data Fig. 8a), mimicking the low pH-induced signal- 
ling observed with GPR68 receptors in human airway smooth muscle 
cells’®. The activation of GPR68 in smooth muscle cells by extracellular 
acidification is linked to several downstream pathways and biological 
responses!®!9.22,34-37, which a selective allosteric modulator, such as 
ogerin, may help to disentangle. 

To investigate effects in behaviour associated with modulation of 
the hippocampus, where GPR68 is highly expressed’, we evaluated 
GPR68-knockout and wild-type mice in a learning and memory test, 
fear conditioning, in which the hippocampus has important roles 
(Extended Data Figs 8 and 11). In wild-type mice, ogerin attenuated 
contextual-based fear memory without effects on cue-based memory 
(Fig. 4c, d). The magnitude of these effects is comparable to those 
of compounds targeting other hippocampus-expressed GPCRs**?, 
and larger effects are rarely observed without surgical lesion of the 
hippocampus”. Crucially, the administration of ogerin had no effect 
on memory retrieval in GPR68-knockout mice (Fig. 4c, d), indicating 
that the in vivo effects of ogerin are GPR68-dependent. Furthermore, 
the less active ogerin isomer, ZINC32547799, had no measurable effect 
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Figure 5 | Discovery of GPR65 allosteric agonist and negative allosteric 
modulator. a—c, Predicted interactions of BTB09089 (a), ZINC13684400 
(b) and ZINC62678696 (c) with GPR65. Overlaid ogerin (thin magenta 
lines) (a) or BTB09089 (thin blue lines) docking poses with GPR68 or 
GPR65, respectively (b, c). d, ZINC13684400 (30 uM) displayed GPR65 
allosteric agonist activity at pH 8.40 but not at lower pH or in control cells 
(n= : measurements). e, ZINC13684400 as a GPR65 agonist at pH 8.40 
(n= 3). f, ZINC62678696 shifts BTB09089 curves downward at pH 8.40 


on learning and memory in wild-type mice (Fig. 4c, d and Extended 
Data Fig. 8d-i). The effects of ogerin thus support a role for GPR68 in 
hippocampal-associated memory. 


General applicability of the approach 

To explore the broader usefulness of this approach, we sought ligands 
for GPR65, another understudied pH-sensing receptor, which shares 
37% sequence identity to GPR68. We found that a recently reported 
GPR65 agonist BTB09089 (ref. 41) is an allosteric agonist of GPR65 
(Fig. 5d, e and Extended Data Fig. 10a). We used BTB09089 to anchor 
modelling of GPR65, generating 500 homology models templated on 
GPR68. The final docked GPR65-BTB09089 model resembles that of 
GPR68-ogerin, with several side-chain substitutions in the putative 
binding site (Fig. 5a). 

We docked the same 3.1-million compounds against the GPR65 
model, purchasing 45 new molecules for testing (Fig. 5a-c and 
Supplementary Table 10). ZINC13684400 showed agonist activity 
of more than twofold of basal at GPR65, with a potency of 500 nM, 
without measurable activity at control cells (Fig. 5e and Extended Data 
Fig. 9). As with BTB09089, ZINC13684400 did not potentiate proton 
efficacy at GPR65 (Fig. 5d), but acted as an allosteric agonist. To test 
the model, three residues modelled to interact with both BTB09089 
and ZINC 13684400, Arg187, Phe242 and Tyr272, were mutated, as was 
Asp153, which appears to only hydrogen-bond with ZINC13684400 
(Fig. 5b). Arg187Leu, Phe242Ala and Tyr272Ala reduced the activity 


(n=4). Ka and Kg are the equilibrium binding affinities of the orthosteric 
agonist proton (A) and allosteric modulator (B), respectively. Normalized 
results (d-f) are mean + s.e.m., and curves were analysed using a four- 
parameter logistic function (e) or a standard operational allosteric model 
(f). g, Predicted ternary complex between GPR656, ZINC62678696 and 
BTB09089, detailed interactions (left) and overall orientation in the 
GPR65 structure (right). 


of both compounds (Extended Data Fig. 10f, g), whereas Asp153Ala 
had no effect on BTB09089 but much reduced the activity of 
ZINC13684400, consistent with the model. Several other docking hits 
inhibited GPR65 when the receptors were activated by protons or by 
BTB09089, including ZINC62678696 (Extended Data Fig. 10b-d). 
Unexpectedly, ZINC62678696 does not compete with BTB090839, as 
predicted, but rather acts as a BIB09089 negative allosteric modula- 
tor (Fig. 5f), suggesting that the two molecules can bind to GPR65 
simultaneously (Fig. 5g). 


Discussion 

A combined empirical and structure-based approach discovered potent 
PAMs at the understudied receptor GPR68, and an allosteric agonist 
and negative allosteric modulators for the understudied GPR65. This 
supports the usefulness of the approach for illuminating the ‘dark 
matter’ of the GPCRs—the 38% of non-olfactory GPCR targets whose 
ligands and function are understudied or unknown!. Whereas truly 
high-throughput screens are impractical for targets of unknown func- 
tion, lower-throughput screens are often feasible. Although the hits 
from such a screen may be unsuitable as probes, they can anchor com- 
putational screens for more optimized compounds. Correspondingly, 
we would not ordinarily expect docking to succeed against models of 
a target that shares only 29% sequence identity with its nearest tem- 
plate. By calculating several thousand models, and insisting that the 
relevant ones are those that prioritize active over inactive molecules, 
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functionally relevant models are prioritized. The new ligands that 
emerged are specific for the target and one is active in vivo, supporting 
their use as chemical probe for the function of GPR68. 

Pharmacologically, the most unexpected observation was the activ- 
ity of GPR68 in learning and memory. Previous studies in GPR68- 
knockout mice revealed only modest phenotypic changes'®?!*, none in 
higher brain function, even though GPR68 is most highly expressed in 
the brain. Ogerin transiently and reversibly reduced contextual-based 
fear memory in wild-type but not GPR68-knockout mice, consistent 
with on-target activity in vivo. In hindsight, this is perhaps only accessi- 
ble to chemical modulators, which can have PAM activities. Inhibitory 
genetic perturbations, such as knockouts or knockdowns, although 
crucial to demonstrating on-target activity through chemical genetic 
epistasis, cannot on their own reveal such activation-based modulation. 

Deorphanizing a receptor can also illuminate its off-target roles for 
known drugs. The observation that lorazepam and its primary metab- 
olite, desmethyldiazepam, are GPR68 PAMs may clarify several of the 
idiosyncratic effects of this widely used anxiolytic. Lorazepam, uniquely 
among benzodiazepines, can treat catatonia, an effect proposed to 
involve an unknown secondary target”. GPR68 may have a role in this 
efficacy, as both drug and metabolite reach micromolar concentrations 
in plasma during treatment’. 

Certain caveats bear airing. The combination of empirical and com- 
putational screens will not work for all orphan receptors. GPCRs that 
are poorly expressed or non-functional in yeast or transfected cells 
will be problematic, and some orphans will simply not recognize any 
of the molecules screened in the small empirical libraries. Also, some 
orphans will bear too little similarity to templates of known structure 
to support accurate modelling. Even those that do work will demand 
cycles of testing and optimization, which was crucial for both GPR65 
and GPR68. 

These cautions should not obscure the key observations from this 
study—that combining empirical and structure-based screening led to a 
probe molecule that reveals some of the functions of GPR68. The find- 
ing that ogerin potentiates GPR68 activation and downstream MAP 
kinase pathways, and previous observations that the receptor mediates 
airway inflammation, enables campaigns for GPR68 PAMs that may 
regulate respiratory inflammatory responses. Uniquely as PAMs, these 
compounds would have fidelity to the natural spatial and temporal 
activation of GPR68. Correspondingly, the role of GPR68 in anxiety 
offers a new route to treating this condition and related CNS disorders, 
an area in need of new therapeutic modalities’. Methodologically, this 
approach may have broad application to illuminating the function of 
the dark matter of the genome, that still large area of pharmacology in 
which targets are known, but function is hidden. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Chemicals, reagents and cells lines. Chemicals and reagents used in this study, if 
not specified otherwise, were purchased from commercial sources (Sigma, Tocris, 
Fisher Scientific, or specified in Supplementary Tables 2 and 3 of chemical struc- 
tures) or synthesized as outlined in the Supplementary Information. HEK293 
(ATCC CRL-1573; 60113019; certified mycoplasma free and authentic by ATCC) 
and HEK293-T (HEK293T; ATCC CRL-11268; 59587035; certified mycoplasma 
free and authentic by ATCC) cells were from the ATCC. Cells were also validated 
by analysis of short tandom repeat (STR) DNA profiles and these profiles showed 
100% match at the STR database from ATCC. Ogerin and its inactive analogue 
ZINC32547799 are available for use as chemical probes from Sigma-Aldrich 
(ogerin: SML1482, ZINC32547799: SML1483). 

Homology modelling. The alignment for the construction of the GPR68 mod- 
els was generated using PROMALS3D, and homology models were built with 
MODELLER-9v8 (ref. 45), using the crystal structure of the chemokine CXCR4 
receptor (PDB code 30DU) as the template (Extended Data Fig. 2f). This align- 
ment was also used to generate 500 models of GPR65 directly from the final 
GPR68 model. The initial alignment included both human and mouse sequences 
of GPR68, as well as those of its closest homologue, GPR4. These were aligned 
against the whole human C-X-C chemokine receptor family. The alignment was 
manually edited to: remove the amino and carboxy termini that extended past the 
template structure, remove the engineered T4 lysozyme, and create different align- 
ments of the flexible and non-conserved second extracellular loop (the final result 
is given in the provided alignment, Extended Data Fig. 2f). A total of 407 models 
were built directly based on the CXCR4 crystal structure, using MODELER-9v8 
(ref. 45), while five more were built from each of 580 elastic network models 
(ENMs), produced by the program 3K-ENM*“*, for a total of 3,307 models built 
during each iterative round of model refinement. Models with constraints between 
pairs of extracellular His residues (His17—His169, His17—His269, His17—His84 
and His84-His169) to mimic the inactive state of the protein were generated by 
enforcing a distance constraint of 2.7 A between the imidazole nitrogens, with a 
standard deviation of 0.1 A. Confirmed active compounds and analogues using 
CXCR4-based model had neither agonist nor antagonist activity at CKCR4 recep- 
tors (Extended Data Fig. 5j, k). 

Model evaluation. Before docking, the second extracellular loop (EL2), between 
residues 161-177, was removed from each GPR68 model. Models were ranked on 
the basis of prioritizing active benzodiazepines (lorazepam and desmethyldiaze- 
pam) over the rest of the inactive NCC library that was used in the yeast screen, as 
well as over property-matched decoys. In addition, the docked pose of lorazepam 
had to form a hydrogen bond from its N-H group to a polar side chain in GPR68. 
Five different sites were sampled for possible lorazepam binding, based on the 
locations of the co-crystallized CXCR4 small molecule antagonist 1T1t (in PDB 
code 30DU), cyclic peptide CVX15 (in PDB code 3OE0), and the positions of the 
biogenic amines crystallized with the B.-adrenergic receptor (PDB code 2RH1) and 
the dopamine D3 receptor (PDB code 3PBL). The entire NCC library was docked 
to each of the five sub-sites for several rounds of iterative binding site refinement. 
In each round, the top-ranked models were examined for a binding pose that made 
hydrophobic and electrostatic interactions with the receptor, including the key 
N-H hydrogen bond. Residues within 6 A of the lorazepam pose were minimized 
around the docked ligand with PLOP*”. The NCC library was then re-docked 
into this optimized binding site for each model. This refinement continued for 
several cycles until the top-ranked models all converged to the same lorazepam 
pose. Once the final model was chosen, we built the EL2 back onto the receptor 
using MODELLER-9Vv8 (ref. 45) and optimized 1,000 different EL2 conforma- 
tions around the lorazepam pose with PLOP. Finally, we docked the NCC library 
back into these 1,000 different EL2-GPR68 structures, and chose a final model 
that retained the previous pose and prioritized the active over the inactive com- 
pounds. The GPR65 model was generated similarly, using the pose of BTB09089 
as the primary selection criterion, although in this case the EL2 was always pres- 
ent. To determine the ternary complex model of ZINC62678696 and BTB09089, 
ZINC62678696 was docked to the putative binding site in the GPR65 model with 
BTB09089 present. Then, both ligands were minimized with PLOP. Next, the side 
chains of the GPR65 binding pocket were allowed to relax, and, finally, BTB09089 
and ZINC62678696 were simultaneously minimized again with PLOP. Structural 
models (PDB files) of characteristic GPR68-modelled complexes (with ogerin or 
lorazepam) and GPR65-modelled complexes (with BTB09089 or BTB09089 and 
ZINC62678696) are shown in the Supplementary Data. 

Virtual screens. We used DOCK 3.6 to screen the ZINC database (Results). The 
flexible ligand sampling algorithm in DOCK 3.6 superimposes atoms of the docked 
molecule onto binding site matching spheres, which represent favourable posi- 
tions for individual ligand atoms. Forty-five matching spheres were used, using 
the previous refinement round’s pose of lorazepam. The degree of ligand sampling 
is determined by the bin size, bin size overlap and distance tolerance, set at 0.4A, 


0.1A and 1.5A, respectively, for both the matching spheres and the docked 
molecules. The complementarity of each ligand pose was scored as the sum of the 
receptor-ligand electrostatic and van der Waals’ interaction energies, and corrected 
for context-dependent ligand desolvation. Partial charges from the united-atom 
AMBER force field were used for all receptor atoms; ligand charges and initial 
solvation energies were calculated using AMSOL**? (http://comp.chem.umn. 
edu/amsol/). The best-scoring conformation of each docked molecule was then 
subjected to 100 steps of rigid-body minimization. 

Selection of potential ligands for testing. We docked the approximately 3.1 
million commercially available molecules of the lead-like subset of the ZINC data- 
base to the final GPR68 and GPR65 models. The full hit list was automatically 
filtered to remove molecules that possess high-internal-energy, non-physical con- 
formations, which are not well-modelled by our scoring function. The reported 
rankings reflect this filtering. From the top 0.1% (~3,000 molecules) of the docked 
ranking list, 17 compounds were chosen for testing, based on complementarity to 
the binding site and presence of predicted electrostatic interactions with Glu160, 
Arg189, Tyr244, Tyr268 and His269, mimicking those predicted for lorazepam. For 
GPR65, compounds were chosen based on complementarity to the binding site and 
similarity to the predicted binding pose of BTB09089, modelled to interact with 
Asp153, Arg187 and Tyr272, and by aromatic stacking with Trp70. 

In silico lead profiling. To examine specificity and to discover other potential 
GPCR targets for the newly discovered GPR68 PAMs, we used the SEA pro- 
gram?!*°, which compares individual ligands and sets of ligands to the ligand sets 
for multiple targets; two targets are related, or a particular ligand is predicted to 
modulate a target, if the sets of ligands are related to one another. Here, the query 
set was all of the new GPR68 PAMs, which was screened against either the 2,512 
ligand—target set with activity of 10 uM or better from the ChKEMBL12 database”’, 
or against the Tocris Mini library. 

Receptor constructs and yeast growth assays. Twenty-four human GPCR plas- 
mids (GPR1, GPR4, GPR15, GPR31, GPR39, GPR41, GR43, GPR45, GPR55, 
GPR57, GPR58, GPR62, GPR65, GPR68, GPR83, GPR84, GPR87, GPR88, 
GPR123, GPR132, GPR133, GPR157, GPR161 and ADCYAP1R1) were obtained 
from http://cdna.org, subcloned into the multiple cloning site of the yeast high copy 
number plasmid p426GPD (ref. 52) and were confirmed by full-length sequencing 
(Eton Bioscience). The yeast strains used were provided by M. Pausch (Merck) 
and have been previously described® and used by us”>*! ; MPY578t (G; yeast), 
MPY578q5 (Gg yeast) and MPY578s5 (G, yeast) express chimaeric G proteins 
in which the last five amino acids of the yeast Ga protein are replaced with their 
mammalian G;, G, or G; homologues, respectively. These strains contain the HIS3 
gene under the control of the FUS1 promoter. GPCR transformants in yeast were 
selected and maintained on synthetic defined (SD) media lacking uracil (Clontech). 
GPR68, indicates the GPR68 paired with G, yeast; while GPR4, indicates GPR4 
paired with G, yeast, and similarly for the other GPCRs. The yeast screening assays 
were carried out as described previously*®. Assays were set up in 96-well flat- 
bottom clear assay plates that contained 50 ul of test compound at 40 1M (final con- 
centration of 101M, in triplicate) diluted in SD-His-Ura medium (Clontech), 50 ul of 
3-amino-1,2,4-triazole (3-AT) at 4x concentration diluted in SD-His-Ura medium 
(pH 5.4), and 100 ul of yeast cell suspension diluted in SD-His-Ura medium to a 
final A600 nm of 0.02. Growth was at 30°C for 2-5 days. Before measurement of cell 
growth, cells were re-suspended by repeated gentle pipetting to ensure uniform sus- 
pension of cells. Cell growth was measured by absorbance at 600 nm in a microplate 
reader (POLARstar Omega, BMG Biotech). After culling of data from obviously 
contaminated wells, the A¢oo nm Values of each individual well were adjusted as fol- 
lows: 100 (Agoo nm Of test well — Agoo nm of plate median value) to give percentage 
growth stimulation (positive values), or percentage growth inhibition (negative 
value) in the form of mean +s.e.m. of three wells. 

To measure and control constitutive activity or leaky HIS expression, each 

receptor-yeast combination was plated as above in the absence of ligand over a 
range of concentrations of 3-AT. Concentrations of 3-AT that showed moder- 
ate yeast growth (that is, A values of 0.2-0.6) after 2 days at 30°C were used in 
assays for drug screening. To measure concentration-dependent activity, various 
concentrations of cognate ligands diluted in SD-His-Ura medium were incu- 
bated with transformed yeast and appropriate concentrations of 3-AT for 2 days 
at 30°C. 
Site-directed mutagenesis. The GPR68 plasmid was obtained from http://cdna. 
org. Mutation of Glu160Ala, Glu160Lys, Glu160GIn, Arg189Leu, Arg189Met and 
His269Phe in the GPR68 and mutation of Asp153Ala, Arg187Leu, Phe242Ala 
and Tyr272Ala in the GPR65 were introduced with Agilent’s QuikChange II 
site-directed mutagenesis kit and confirmed by sequencing. To tag the receptors 
for comparing receptor expression levels with immunoblotting, Flag epitope tag 
was inserted at the C terminus of the GPR68 wild-type and mutant receptors, also 
using the QuikChange II site-directed mutagenesis kit. Insertion was confirmed 
by sequencing. 
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Split-luciferase based cAMP reporter assays with proton receptors. GPR4, 
GPR65 and GPR68 plasmids were obtained from http://cdna.org. GPR68 muta- 
tions were made and confirmed as above. Receptor-mediated G, activation was 
measured using a split-luciferase reporter assay (GloSensor cAMP assay, Promega). 
In brief, HEK293T cells were transiently co-transfected with receptor DNA and the 
GloSensor cAMP reporter plasmid (GloSensor 7A). Transfected cells were plated 
in poly-L-Lys-coated 384-well white clear bottom cell culture plates in DMEM 
supplemented with 1% dialysed FBS at a density of 15,000 cells per well in a total 
volume of 40 ul for a minimum of 6h. Before assays, culture medium was removed 
and cells were incubated with luciferin (4mM prepared in drug buffer, pH 8.4) 
for 90 min at 37°C. The drug buffer was made with 1 x HBSS supplemented with 
10mM HEPES and 10mM MES modified from!’. TAPS was added to accommo- 
date higher pH values for some assays; no difference was observed between differ- 
ent buffers under the same pH conditions. Cells plated at pH 8.4 for 6h generated 
the same Ht concentration-response curves as those plated at pH 7.4. To make 
individual pH solutions, the pH was adjusted with NaOH and measured at room 
temperature with a pH 211 Microprocessor pH meter (Hanna Instruments). To 
measure modulator activity under different pH conditions, modulator was mixed 
with pH solutions before adding to cells. To achieve the goal that drug solutions 
were delivered at the correct pH values, luciferin solution was removed from cell 
plates before addition of drug solutions at predetermined pH values. To improve 
solubility for some hydrophobic compounds, 1 mg ml”! BSA was added to drug 
solutions, and it had no effect on Ht concentration-response curves. For Gg 
protein activity (CAMP production), the cell plate was usually incubated at room 
temperature for 20 min before being counted in a luminescence counter. Results 
were analysed using GraphPad Prism. 

Allosteric operational model and data analysis. To estimate allosteric parameters, 
results were fitted to the allosteric operational model*”* as shown in the following 
equation: 


Response = basal + (Ey, — basal) 
(7 s[A] (Kg + 0-8[B]))” 


* (AIK, + K,Ky + Kq[B] + of A][B])” + (t4[A](K, + 1B)" 


In which: 

(1) Response is the measured activity in the form of RLUs for measurement of 
cAMP production. If the results were normalized, the ‘response’ is RLU in fold 
of basal (with buffer control as basal). 

(2) Emax is a system parameter, representing the maximal possible response of the 
system, and this value was normally constrained to the maximal reading of the 
corresponding experiment. 

(3) Basal is the baseline in the absence of test ligand, and is constrained to the 
baseline of the corresponding experiment. If results were normalized to fold 
of basal, the ‘basal’ was usually 1.0. 

(4) [A] and [B] represent concentrations of the orthosteric and allosteric ligands, 
respectively. In the case of GPR68, A is proton. 

(5) Ka and Kg are the equilibrium dissociation constants of the orthosteric 
agonist proton (A) and allosteric modulator (B), respectively. To facilitate 
curve-fitting with the model, Ky is usually fixed to the binding affinity 
determined from traditional radioligand binding assays under the assump- 
tion that the experimentally derived binding affinity is not significantly 
different from the functional affinity under the condition for correspond- 
ing functional assay. Since proton binding affinity is not a measurable 
parameter in this assay system, the proton Kg is therefore constrained to 
the corresponding proton potency (ECs0, the proton concentration for half- 
maximal response) value in the absence of the allosteric ligand, under the 
assumption that the proton potency is not significantly different from its 
binding affinity when the cAMP production assay is carried out. Since pro- 
tons are present at relevant concentrations at physiological pH values, for 
a proton receptor K; is largely a fitting parameter without a clear physical 
meaning. 

(6) The term T, is the orthosteric agonist proton efficacy parameter. Since allosteric 
modulators in this study showed no agonist activity, the allosteric modulator 
efficacy Tx is therefore 0 and not included in the function. 

(7) The term n is the slope factor linking receptor occupancy to response. Steep 
slopes in this study indicated high cooperativity between proton binding and 
receptor activation, probably reflecting the fact that the proton receptors oper- 
ate within a narrow physiological pH range. 

(8) The allosteric parameter « defines the mutual effect between the orthosteric 
agonist A and the allosteric modulator B (a > 1 for increased affinity and a<1 
for reduced affinity); while defines the allosteric effect on agonist efficacy 
(> 1 for increased efficacy and B < 1 for reduced efficacy). 
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With Ka, basal and Ex constrained to their corresponding values, the para- 
meters Kg, Ta, a, 8 and n are globally shared fitting parameters for a family of 
proton concentration-response curves in the absence and presence of increasing 
concentrations of a test allosteric modulator. With the above settings, most curves 
could be easily fitted to generate reasonable parameters. If Prism could not fit 
the curves, but generated ‘ambiguous fitting’ results, the a value was then manu- 
ally constrained to an initial fitting value and systematically changed with small 
increments or decrements until the highest stable high affinity value (Kg) was 
reached. For GPR65 and GPR68, Kg represents the allosteric binding affinity in the 
absence of protons, which is unmeasurable and thus has little physical meaning. 
The value Kg/(1+-@) represents the binding affinity of an allosteric ligand in the 
presence of protons, which could be estimated experimentally. For convenience, 
we call Kp/(1+a) the ‘Biochemical binding affinity, Kps’ (Supplementary Table 8) 
for an allosteric ligand in the presence of an orthosteric agonist (in this case, H*). 
Calcium mobilization assays. HEK293T cells were transfected and plated into 
poly-L-Lys-coated 384-well black clear bottom cell culture plates in DMEM sup- 
plemented with 1% dialysed FBS, at a density of 15,000 cells in 40 ul per well for 
overnight. Before the assay, medium was removed and cells were loaded with 
Fluo-4 Direct calcium dye (Invitrogen) for 60 min at 37°C in a 5% CO, atmosphere. 
The calcium dye was prepared in drug buffer supplemented with 2.5 mM probe- 
necid, pH 8.0. Proton solutions were made with 1x HBSS, 7mM HEPES, 7mM 
HEPPS and 7 mM MES, and pH was adjusted with NaOH. Drug additions and 
fluorescence intensity measurement were carried out in a FLIPR™?®, which was 
programmed to add drug solutions to cells while recording fluorescence intensity. 
To measure proton concentration-responses, 10 tl of pH pre-determined solutions 
were added to each well (with 20 ul calcium dye) while fluorescence intensity was 
recorded during and after addition for 4 min (one reading per second). The addi- 
tion procedure was configured in such a way (30 ul per second at height of 10 ul 
above cells) that local proton concentrations for cells were essentially the same as 
in the pH working solutions at the moment of addition. Fluorescence intensities 
reached peak values within 30s after drug addition. To determine the effects of 
modulators on proton responses, the protocol was modified slightly. In brief, cells 
were loaded with calcium dye as above, but only at 15 pl per well. The FLIPR™ 784 
was programmed to first add 5 ul of 4x test compound (final concentration of 
10 uM before addition of 10 ul of pH solutions) prepared with the same drug buffer 
at pH 8.0 (buffer alone served as a control). After a total of 10 min of reading 
and incubation, 10 ul of the pH solutions were added and the fluorescence inten- 
sity was recorded exactly the same way as above. Results (fluorescence intensity 
in fold of basal) were exported and analysed in GraphPad Prism. For calcium 
mobilization assays with 5-HT , receptors, HEK293 cells stably expressing human 
5-HT», receptors were used instead of transiently-transfected cells. Cells were 
set up and tested in the same way as above, with 5-HT serving as an agonist control 
(3 pM-30uM), and with 1 nM 5-HT being used in the second addition to deter- 
mine the antagonist activity of ogerin. 

Phosphatidylinositol hydrolysis assay. HEK293T cells were transfected for 24h 
and plated in poly-L-Lys-coated 96-well black clear bottom cell culture plates in 
DMEM supplemented with 10% FBS, at a density of 60,000 cells in 100 ul per 
well. After 5 h, cells were washed with inositol-free DMEM once and labelled with 
3H-inositol (1 Ci per well, PerkinElmer) in inositol-free DMEM supplemented 
with 5% dialysed FBS overnight. On the assay day, labelling medium was removed 
and cells were washed once with assay buffer (1x HBSS, 10mM HEPES, 10mM 
MES and 20 mM LiCl, pH 8.4). To measure drug concentration responses, then 
cells were then incubated with drug solutions at pH 8.4 for 20 min. To measure 
proton concentration responses, the assay buffer was pre-adjusted to desired pH 
values and supplemented with 20 mM LiCl. To measure the effect of ogerin or its 
isomer ZINC32547799 on proton concentration—-response curves, pH solutions 
were supplemented with 20 mM LiCl and 10 uM ogerin or ZINC32547799. The 
premixed drug solutions were added to cells for 20 min. At the end of incubation, 
drug solutions were removed and 40 ul per well of 50 mM ice-cold formic acid 
was added. After incubation at 4°C for 30 min, the acid extracts were transferred 
to polyethylene terephthalate 96-well sample plates (1450-401, Perkin Elmer) and 
mixed with 75 ul (200 ug) YSi RNA binding beads (RPNQ0013, Perkin Elmer). 
The plate was sealed and further incubated at 4°C for 30 min before being counted 
on a TriLux MicroBeta counter. Results (c.p.m. per well) were analysed using 
Graphpad Prism. 

Functional assays with Aj, and CXCR4 receptors. Functional assays with Az, 
adenosine and CXCR4 chemokine receptors were carried out using a slightly dif- 
ferent protocol from that previously described for G, (above) and G; receptors™. 
Specifically, HEK293T cells were transfected and plated using regular DMEM sup- 
plemented with 1% dialysed FBS. Before assays, culture medium was removed, 
and cells were incubated with 20 ul drug solution (prepared in drug buffer 
20mM HEPES, 1x HBSS, pH 7.4) for 15 min at room temperature. To measure 
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agonist activity, 5 ul of 5x luciferin solution (4mM final concentration) for Az, 
(G,-coupled GPCRs) or a mixture of luciferin and isoproterenol at a final con- 
centration of 200nM for CXCR4 (G;-coupled GPCRs) was added and cells were 
incubated for another 20 min. To measure antagonist activity, test compound was 
added first for 10 min before a reference agonist at a final of ECgp concentration for 
another 10 min, and then followed by addition of luciferin for Aj, or a mixture of 
luciferin and isoproterenol for CXCR4 as above. Luminescence was measured in 
a luminescence counter. Results were analysed in GraphPad Prism. 

Radioligand binding assays. Radioligand binding assays with selected CNS tar- 
gets were carried out as described***” and as detailed in the PDSP protocol book 
available online (http://pdsp.med.unc.edu/pdspw/binding.php). In brief, receptor 
membrane preparations were made from either animal brain tissues, or stable cell 
lines, or transiently transfected HEK293T cells. Receptor expression levels and 
radioligand binding affinities were determined with saturation binding assays. 
Competition binding assays were performed with membrane aliquots and a fixed 
concentration of radioligand in 96-well plates in a final volume of 125 ul. Reactions 
were incubated in the dark and at room temperature (22°C), and terminated by 
vacuum filtration onto 96-well formatted GF/B filters. Radioactivity on the filters 
was counted in a beta counter. Results were analysed in GraphPad Prism. 
Anti-HA immunoblots. HEK293 cells were transfected with either pcDNA3 
vector containing a haemagglutinin (HA) cassette within the multiple cloning 
site, or pCDNA3HA-GPR68 encoding human GPR68 with an N-terminal HA tag. 
Stable lines were generated by selection with 250 ug ml! G418, with >90% of cells 
expressing HA after 2 weeks as assessed by immunocytochemistry (not shown). 
Cells were plated into 12-well plates, grown to confluence, and media switched 
to Hams-F12 media, with pH adjusted to pH 8.0 or 7.4, for 1h. Cells were then 
stimulated with vehicle, 50 uM ogerin, or 501M lorazepam for 10 min. Lysates 
were collected and subjected to immunoblotting, with blots probed using primary 
antibodies against HA (Sigma cat H3663), total vasodilator-stimulated phospho- 
protein (VASP, BD Biosciences, cat 610448), p-p42/p44 (Cell Signaling, cat 5726S), 
and B-actin (Sigma, cat A1978), and secondary antibodies (Licor, cat 926-32213 
and 926-32210) conjugated with infrared fluorophores as described previously™. 
Anti-Flag immunoblots. HEK293T cells were transiently transfected in 10-cm 
dishes with Flag-tagged GPR68 wild-type and mutant receptors. Untransfected 
HEK293T cells served as a negative control. After 48 h, cells were collected, lysed 
and sonicated to shear chromatin before being subjected to immunoblotting. Blots 
were probed with monoclonal anti-Flag M2-peroxidase antibody (Sigma, A8952). 
Bands were quantified and normalized to GPR68 wild-type receptor (fold) for 
graphing. 

Data analysis and reporting. Other than in vivo studies (below), no statistical 
analysis was applied to yeast- or cell-based screening assays. Sample size (number 
of assays for each compound or receptor) was predetermined to be in triplicate 
or quadruplicate for primary screening assays at a single concentration. Some 
samples were repeated more than the others in the primary screening assays and 
the number of measurements were specified as a range in corresponding figure 
legends. For concentration-response assays, the sample size (number of assays for 
each compound at selected receptors) was also predetermined to be tested for a 
minimum of three assays, each in triplicate or quadruplicate. Samples or receptors 
were tested not randomly but in an alphabetic order or numeric order according 
to their coded names for easy organization and were thus blinded. For each batch 
of assays, a control assay with isoproterenol and proton concentration-responses 
were included. If potency values for either isoproterenol or proton was >0.5 log 
unit away from established averages, assays with the batch of transfected cells 
were excluded. For structure-activity relationship (SAR) studies, only the assays 
in which all related compounds were tested side by side were included. None of 
the functional assays were blinded to investigators. 

Generation of GPR68-knockout mice. To generate GPR68-knockout mice, 
a probe specific for the human GPR68 transcript was generated by PCR amplifi- 
cation of a 450-base-pair (bp) segment of the coding sequence of the final exon 
of GPR68 using total placental RNA. The probe was used to identify a clone from 
a 129 mouse genomic lambda library. The genomic insert was subcloned and a 
restriction map generated using a panel of enzymes. The targeting construct for 
the GPR68 locus consists of a PGK-1 promoter driven neomycin resistance cassette 
flanked by two arms of homology with the mouse GPR68 locus. The longer arm 
of homology was generated using a 7,266-bp PstI fragment extending from the 
last intron to the beginning of the last exon. This exon contains the entire coding 
sequence of the GPR68 gene. The 1,335-bp shorter arm was generated by PCR 
amplification and extends from the downstream end of the long arm into the 
3’ untranslated region of the gene. Homologous recombination of the targeting con- 
struct with the GPR68 locus inserts the neomycin resistance cassette into codon 78 
of the gene, thereby disrupting expression. Correctly targeted cell lines were identi- 
fied by Southern blot analysis using a probe consisting of a 1,496-bp PstI fragment 


immediately upstream of the long arm. This probe recognizes a 14,290-bp EcoRV 
fragment in the endogenous locus and a 7,855-bp fragment in the targeted locus. 
Genotyping was carried out by PCR with three primers. The common (5’-GCAG 
AGGAAGCCCACGCTGATGTA-3’) and endogenous (5’-TAAACGGTAGCTGT 
GATTATTCAA-3’) primers generate a 516-bp PCR product from the endogenous 
locus, while the common and targeted (5‘-AAATGCCTGCTCTTTACTGAAGG-3’) 
primers generate a 465-bp product from the targeted locus. The chimaeras were 
bred to C57BL/6J mice and pups carrying the mutant allele identified. After ten 
successive crosses of heterozygous animals to C57BL/6J mice, heterozygous mice 
were intercrossed and a congenic Gpr68~/~ and C57BL/6J breeding colony estab- 
lished. The GPR68-knockout mice were profiled in several behavioural tests as 
described below in detail and results are summarized in Extended Data Fig. 11 
and Supplementary Tables 11 and 12. 

In vivo behavioural profiles of GPR68-knockout mice. Mice were maintained 
and handled according to the Guide for the Care and Use of Laboratory Animals 
approved by the Institutional Animal Care and Use Committee of the University 
of North Carolina at Chapel Hill. The goal of this study was to determine whether 
targeted deletion of GPR68 alters behavioural function in mice. 

Timeline for behavioural tests. The following tests were performed with mice at 
the ages shown in parentheses. Elevated plus maze test for anxiety-like behaviour 
(6-7 weeks); activity in an open field, accelerating rotarod (2 tests, 48h apart) 
(7-8 weeks); three-chamber social approach test, activity in an open field (re-test) 
(8-9 weeks); marble-burying assay (9-10 weeks); acoustic startle test, buried food 
test for olfactory ability (10-11 weeks); visual cue test in the Morris water maze 
(11-12 weeks); hidden platform test for spatial learning (12-14 weeks); reversal 
learning in the Morris water maze (14-16 weeks); second acoustic startle test, 
hotplate test for thermal sensitivity (16-17 weeks). 

Summary of results. Mice with deletion of GPR68 had normal performance 
in most of the behavioural tests. No effects of genotype were observed for body 
weights, activity and anxiety-like behaviour in an elevated plus maze or an open 
field, motor coordination, sociability, prepulse inhibition of acoustic startle 
responses or acquisition in the water maze. However, both male and female GPR68- 
knockout mice had small, significant decreases in acoustic startle responses, sug- 
gesting a reduced responsivity to environmental stimuli. Male GPR68-knockout 
mice also showed significant decreases in marble burying, a test for anxiety-like 
phenotypes. Overall, the findings indicate that GPR68 might have a role in specific 
domains of behaviour. 

Elevated plus maze. This test is used to assess anxiety-like behaviour in rodents. 
The procedure is based on a natural tendency of mice to actively explore a new 
environment, versus a fear of being in an open area. In the present study, mice were 
given one 5-min trial on the plus maze, which had two walled arms (the closed 
arms, 20cm in height) and two open arms. The maze was elevated 50 cm from the 
floor, and the arms were 30cm long. Animals were placed on the centre section 
(8 x 8cm), and allowed to freely explore the maze. Measures were taken of time 
on, and number of entries into, the open and closed arms. All of the experimental 
groups showed a strong preference for the closed arms, in comparison to the open 
arms, of the elevated plus maze. As shown in Supplementary Table 11, there were 
no significant differences between the wild-type and GPR68-knockout mice for 
percentage time or percentage entries on the open arms, or for total entries during 
the task. 

Activity in an open field. Exploratory activity in a novel environment was assessed 
in an open field chamber (41 x 41 x 30cm) crossed by a grid of photobeams 
(VersaMax system, AccuScan Instruments). Counts were taken of the number of 
photobeams broken during the trial in 5-min intervals, with separate measures for 
ambulation (total distance travelled) and rearing movements. Time spent in the 
centre region of the open field was measured as an index of anxiety-like behaviour. 
Unfortunately, an equipment malfunction led to the loss of data for 8 mice during 
the first activity test, conducted when mice were 7-8 weeks in age. Therefore, a 
second activity test was given, when mice were 8-9 weeks in age. As depicted in 
Extended Data Fig. 11a, b, there were no significant differences between the wild- 
type and GPR68-knockout mice for distance travelled, or for rearing or centre 
time (data not shown), during the second activity test. A significant sex x time 
interaction was found for the distance measure (F(11,335) = 2.68, P= 0.0025), reflect- 
ing higher levels of activity in the female groups at the beginning of the session. 
Accelerating rotarod test. Subjects were tested for motor coordination and learn- 
ing on an accelerating rotarod (Ugo Basile). For the first test session, animals were 
given three trials, with 45 s between each trial. Two additional trials were given 
48h later. Revolutions per minute (rpm) was set at an initial value of 3, with a 
progressive increase to a maximum of 30 rpm across five minutes (the maximum 
trial length). Measures were taken for latency to fall from the top of the rotating 
barrel. As shown in Extended Data Fig. 11c, d, deletion of GPR68 did not lead 
to deficits in motor coordination on the rotarod. In fact, during the first three 
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acquisition trials, there was a non-significant trend for enhanced performance in 
the male knockout group (repeated-measures ANOVA, genotype x sex interac- 
tion, Fa,35) =3.58, P= 0.0668). 

Marble-burying assay. This procedure is used to evaluate anxiety-like behav- 
iour and repetitive responses. Mice were tested in a Plexiglas cage located in a 
sound-attenuating chamber with ceiling light and fan. The cage contained 5 cm 
of corncob bedding, with 20 black glass marbles (14 mm diameter) arranged in 
an equidistant 5 x 4 grid on top of the bedding. Animals were given access to the 
marbles for 30 min. Measures were taken of the number of buried marbles (two- 
thirds of the marble covered by the bedding). A two-way ANOVA indicated a 
significant genotype x sex interaction (F(1,35) = 7.37, P= 0.0102) (Supplementary 
Table 11). Post-hoc comparisons revealed that the male GPR68-knockout mice 
buried significantly fewer marbles than both male wild-type mice and female 
knockout mice in this task. 

Buried food test for olfactory function. Several days before the olfactory test, an 
unfamiliar food (Froot Loops, Kellogg Co.) was placed overnight in the home cages 
of the mice. Observations of consumption were taken to ensure that the novel food 
was palatable. Sixteen to twenty hours before the test, all food was removed from 
the home cage. On the day of the test, each mouse was placed in a large, clean tub 
cage (46 x 23.5 x 20cm (width, length, height)), containing paper chip bedding 
(3-cm deep), and allowed to explore for 5 min. The animal was removed from the 
cage, and one Froot Loop was buried in the cage bedding. The animal was then 
returned to the cage and given fifteen minutes to locate the buried food. Measures 
were taken of latency to find the food reward. As shown in Supplementary 
Table 11, there were no significant differences between the groups in latency to 
find the buried food. 

Hotplate test for thermal sensitivity. Individual mice were placed in a tall plastic 
cylinder located on a hotplate, with a surface heated to 55°C (IITC Life Science). 
Reactions to the heated surface, including hindpaw lick, vocalization or jumping, 
led to immediate removal from the hotplate. Measures were taken of latency to 
respond. The maximum test length was 30 s, to avoid paw damage. A two-way 
ANOVA indicated a significant main effect of sex (F(1,1) = 8.83, P=0.0053), and 
genotype x sex interaction (F(1,35) =4.3, P= 0.0455) (Supplementary Table 11). 
Post-hoc comparisons revealed that the male GPR68-knockout mice had signifi- 
cantly lower latencies to respond than female knockout mice. 

Acoustic startle method. The acoustic startle test can be used to assess auditory 
function and sensorimotor gating. The test is based on the measurement of the 
reflexive whole-body flinch, or startle response, that follows exposure to a sudden 
noise. Mice can be evaluated for levels of startle magnitude and prepulse inhibition, 
which occurs when a weak prestimulus leads to a reduced startle in response to 
a subsequent louder noise. For this study, animals were tested with a San Diego 
Instruments SR-Lab system. In brief, mice were placed in a small Plexiglas cylin- 
der within a larger, sound-attenuating chamber. The cylinder was seated upon a 
piezoelectric transducer, which allowed vibrations to be quantified and displayed 
on a computer. The chamber included a house light, fan, and a loudspeaker for the 
acoustic stimuli. Background sound levels (70 dB) and calibration of the acoustic 
stimuli were confirmed with a digital sound level meter (San Diego Instruments). 
Each session consisted of 42 trials, which began with a 5-min habituation period. 
There were seven different types of trials: the no-stimulus trials, trials with the 
acoustic startle stimulus (40 ms; 120 dB) alone, and trials in which a prepulse 
stimulus (20 ms; 74, 78, 82, 86 or 90 dB) occurred 100 ms before the onset of the 
startle stimulus. Measures were taken of the startle amplitude for each trial across 
a 65-ms sampling window, and an overall analysis was performed for each sub- 
ject’s data for levels of prepulse inhibition at each prepulse sound level (calculated 
as 100—(response amplitude for prepulse stimulus and startle stimulus together/ 
response amplitude for startle stimulus alone) x 100). 

Results from acoustic startle test. The GPR68-knockout mice had decreased 
startle responses after presentation of acoustic stimuli, in comparison to the 
wild-type mice (Extended Data Fig. 1le, f). A repeated-measures ANOVA, 
conducted on startle response amplitudes, indicated significant main effects of 
genotype (F(1,35) =7.22, P=0.011) and sex (F(1,35) = 16.61, P=0.0003), and a 
genotype x decibel level interaction (F(6,210) =5.77, P< 0.0001). Separate com- 
parisons confirmed that both male and female knockout mice showed signifi- 
cant reductions in startle responses (genotype x decibel level interaction, males, 
F(6,84) = 2.57, P=0.0245; and females, F(6,126) = 3.48, P= 0.0032). The decreased 
startle responses and overt sex differences were not associated with changes in 
prepulse inhibition (Extended Data Fig. 11g, h). The significant main effects of 
genotype on startle were no longer evident during a second acoustic startle test, 
conducted when mice were 16-17 weeks in age. 

Morris water maze, visible platform test. The Morris water maze task was used to 
assess spatial learning and visual function in the mice. The water maze consisted 
of a large circular pool (diameter = 122 cm) partially filled with water (45 cm deep, 
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24-26 °C), located in a room with numerous visual cues. Mice were first tested 
using a visible platform. In this case, each animal was given four trials per day, 
across 2 days, to swim to an escape platform cued by a patterned cylinder extending 
above the surface of the water. For each trial, the mouse was placed in the pool at 
one of four possible locations (randomly ordered), and then given 60s to find the 
visible platform. If the mouse found the platform, the trial ended, and the animal 
was allowed to remain 10s on the platform before the next trial began. If the plat- 
form was not found, the mouse was placed on the platform for 10s, and then given 
the next trial. Measures were taken of latency to find the platform via an automated 
tracking system (Noldus Ethovision). As shown in Supplementary Table 12, all 
groups of mice demonstrated a high degree of proficiency in the visual cue task. 
Acquisition and reversal learning in the hidden platform test (Extended 
Data Fig. 11i-I). Three days after the visual cue task, mice were tested for their 
ability to find a submerged, hidden escape platform (diameter = 12 cm). As in the 
procedure for visual cue learning, each animal was given four trials per day, with 
1-min per trial, to swim to the hidden platform. The criterion for learning was an 
average latency of 15s or less to locate the platform on 1 day. Mice were tested until 
the criterion was reached, with a maximum of 9 days of testing. When criterion 
was reached, mice were given a 1-min probe trial in the pool with the platform 
removed. In this case, selective quadrant search was evaluated by measuring num- 
ber of crosses over the location where the platform (the target) had been placed 
during training, and the corresponding areas in the other three quadrants. After the 
acquisition phase, mice were tested for reversal learning, using the same procedure 
as described above. In this phase, the hidden platform was located in a different 
quadrant in the pool, diagonal to its previous location. As before, measures were 
taken of latency to find the platform. On the day that the criterion for learning was 
met, the platform was removed from the pool, and the group was given a probe 
trial to evaluate reversal learning. 

For the above behavioural profiling studies, subjects were 21 wild-type mice 
(9 males and 12 females) and 18 GPR68-knockout mice (7 males and 11 females), 
on a C57BL/6 background. Sample sizes were not statistically predetermined. 
Testing began when animals were 6-7 weeks of age. For each procedure, measures 
were taken by an observer blinded to mouse genotype (wild type or knockout) 
and no animals were excluded from analysis. Data were analysed using one-way 
or repeated-measures ANOVA. Fisher’s protected least-significant difference tests 
were used for comparing group means only when a significant F value was deter- 
mined. Within-group comparisons were conducted to determine side preference in 
the social behaviour tests. For all comparisons, significance was pre-set at P< 0.05. 
Effect of ogerin and its analogue ZINC32547799 on learning and memory. 
Contextual and cue-dependent learning and memory were evaluated using a 
Near-Infrared Video Fear Conditioning system (MED Associates). Test cham- 
bers (29 x 25 x 25cm) had transparent walls and metal rod floors, and were 
enclosed in sound-attenuating boxes. The conditioned fear procedure had three 
phases: training, a test for contextual learning, and a test for cue-dependent learn- 
ing. Before each phase, mice were moved to a holding room adjacent to the test 
room and acclimated for at least 30 min. In the 8-min training phase, mice receive 
three pairings of a 30-s, 90-dB, 5-kHz tone (the conditioned stimulus) and a 2-s, 
0.6-mA foot shock (the unconditioned stimulus), in which the shock was presented 
during the last 2s of the tone. Context-dependent learning was evaluated 24h 
after the training phase. Mice were placed back into the original test chamber, and 
levels of freezing (immobility) were determined across a 5-min session, without 
the presence of the conditioned or unconditioned stimulus. Forty-eight hours after 
the training phase, mice were evaluated for associative learning to the auditory 
cue (the conditioned stimulus) in a final 6-min session. The conditioning cham- 
bers were modified using a Plexiglas insert to change the wall and floor surface, 
and a novel odour (vanilla flavouring) was added to the sound-attenuating box. 
Baseline behaviour was scored for 2 min, and then three 30-s conditioned stimulus 
tones were presented across a 4-min period. Levels of freezing were automatically 
measured by the image tracking software (Med Associates). Freezing was defined 
as no movement (below the movement threshold) for 0.5 s. To evaluate the effect 
of drug, strain-matched group of animals were given ogerin (10 mg kg in 10% 
Tween 80 or saline) 30 min before the training. 

For the learning and memory studies, sample sizes (number of animals) were 
not predetermined by a statistical method, and minimum of six male animals (age 
of 6-8 weeks) were used in each group (exact number of animals was specified in 
figure legends). Animals were assigned to groups randomly and experiments were 
not blinded to investigators. No animals were excluded from analysis. Statistical 
analyses were performed after first assessing the normality of distributions of data 
sets. Comparisons between groups were made using unpaired t-tests. Welch’s 
corrections were used when variances between groups were unequal. Comparisons 
between groups during conditioning, contextual and cued memory tests were 
assessed using two-way ANOVA with P < 0.05 being considered significant. 
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Extended Data Figure 1 | Validation and confirmation of GPCR 
activation assays. a—o, Yeast (a~k) and HEK293T cell (I-o) GPCR 
activation assays. a-d, Concentration-dependent growth of GPR43- 
expressing G; yeast (a), GPR43-expressing G, yeast (b), GPR41-expressing 
G, yeast (c), and GPR41-expressing Gy yeast (d) in response to various 
short-chain fatty acids (SCFAs). e-h, Concentration-dependent 

growth of GPR39-expressing G, yeast (GPR39,) in response to zinc 

ions (e), chromium ions (f), cadmium ions (g) and iron ions (h). 

i-k, Concentration-dependent cAMP responses of GPR39-expressing 
HEK293T cells to ZnCl, (i), ZnSO4 (j) or CdSOy, (k) as measured by 
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luciferase cAMP reporter assay. 1, N-unsubstituted benzodiazepines 
(lorazepam, clonazepam, desmethyldiazepam and norfludiazepam; 10 uM) 
stimulated cAMP production in a GPR68- and pH-dependent manner. 
Data are mean +s.e.m. (n = 3-66 measurements). m-p, Concentration— 
response curves of N-unsubstituted benzodiazepines lorazepam (m), 
desmethyldiazepam (n), clonazepam (0) and norfludiazepam (p) at 

pH 6.50 or 7.40 in GPR68-transfected HEK293T cells (structures in 
Supplementary Table 1). Normalized results represent mean + s.e.m. 
(n=3) and curves were analysed in GraphPad Prism using the built-in 

4 parameter logistic function. 
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Extended Data Figure 2 | Lorazepam and ogerin have minimal GPR4 
or GPR65 activity. a—d, Effect of lorazepam (a, b) or ogerin (c, d) on 
GPR4 (a, c) or GPR65 (b, d); data represent normalized mean +s.e.m. 
(n= 3). Sequencing alignment proton-sensing receptor and docking poses 
for ogerin and its analogues. e, GPR68 snake plot showing extracellular 
loops and transmembrane domains (upper portion); important residues 
are highlighted. Glu160, Arg189 and His269 were mutated in this study. 

f, Sequence alignment of GPR4, GPR65 and GPR68 to CXCR4 (PDB code 
30DU) (PROMALS-3D) was manually refined to reduce gaps and to 
position conserved residues. TM, transmembrane regions; IL, intracellular 
loop; EL, extracellular loop. Conserved residues highlighted in blue by 
degree of conservation while red boxes indicate residues important for 


KIKALAL 


LIAIVLY 


Tyr268 


receptor function. Red stars indicate residues mutated in this study. 

g, Sampling different regions for lorazepam binding modes in GPR68. 
Yellow and grey surfaces contour the binding site of 1T1t and CVX15 in 
CXCR4 crystal structures (PDB codes 30DU and 3OE0, respectively), 
while green and red surfaces sample the entire binding pocket. The 
magenta surface represents the canonical orthosteric biogenic amine 

site. h, ZINC32547799 in its predicted orientation and interactions with 
GPR68. i, Optimization of ogerin (magenta, thin lines) to C2 (brown, 
structure in Fig. 3a) by insertion of a single methylene is predicted to 
improve packing in the aryl pocket of the ogerin site. Adding a second 
methylene, thus creating a propyl linker in C3 (yellow, structure in Fig. 3a), 
is predicted to disrupt the packing and thus to reduce the allosteric effect. 
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Extended Data Figure 3 | Heat map of off-target activities of lead tested in a hERG functional assay as previously published*’. AMPA, 
compounds at potential CNS drug targets. Radioligand binding assays aminomethylphosphonic acid receptor; BZP, benzodiazepine receptor; 
were carried out by the National Institute of Mental Health Psychoactive DAT, dopamine transporter; DOR, delta (5) opioid receptor; KA, kainate 
Drug Screening Program (NIMH PDSP) as described previously***” acid receptor; KOR, kappa (x) opioid receptor; MOR, mu (1) opioid 
(online protocols available at http://pdsp.med.unc.edu/pdspw/binding. receptor; NAT, noradrenaline transporter; NMDA, N-methyl-p-aspartate 
php). Values represent mean binding affinities (pK;, n = 2-4). Affinities receptor; ND, not determined; PBR, peripheral benzodiazepine binding 
lower than a pK; of 5, or less than 50% inhibition at 10 uM, are shown site; SERT, serotonin transporter; hERG, human ether-a-go-go-related 

as a minimum of 5 on the pKj scale. The hERG inhibition activity was gene (potassium channel Kv11.1). 
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Extended Data Figure 4 | Confirmation of modelling results via 
mutagenesis. a, b, Protons showed agonist activity at GPR68 wild-type 
and mutant receptors in cAMP production (a) and calcium release (b); 
parameters are in Supplementary Table 4. c, Relative GPR68 wild-type 
and mutant receptor expression levels determined by anti-Flag 
immunoblotting (n = 3). d, Proton-mediated cAMP production in 
untransfected cells (n = 16). e, Calcium release by lorazepam and 
selected ZINC compounds (10 uM at pH 8.0, n = 6-22 measurements). 
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f-j, Effect of ogerin and ZINC32547799 (10M) on proton-mediated 
cAMP production (f and g, n= 4), calcium release (h and i, n =3), and 
phosphatidylinositol hydrolysis (j, n = 3) at GPR68 wild-type or mutant- 
transfected HEK293T cells. k, Effect of ogerin and ZINC32547799 on 
phosphatidylinositol hydrolysis at pH 8.4 at GPR68-transfected GPR68 
HEK293T cells (n = 3). Normalized results represent mean + s.e.m. and 
curves were analysed using a four-parameter logistic function. 
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Extended Data Figure 5 | Control experiments for signalling and (K; of 220 nM) of ogerin at Az, (CAMP production, h) and weak antagonist 


pharmacology. a, Basal cAMP production of GPR68 wild-type and 
mutant receptors (mean + s.e.m., m = 24-46 measurements). 

b, pH-dependent activity of ogerin at GPR68 wild type (mean + s.e.m., n =3). 
c, Ogerin concentration—-responses at GPR68 wild-type and mutant 
receptors at pH 9.0 (c, mean +s.e.m., 1 =3), under which cAMP reporter 
assay was not affected (d-f). d-g, Proton modulated isoproterenol- 
mediated G,-activation via §-adrenergic receptors in untransfected (d, f) 
and GPR68-transfected (e, g) cells. Normalized results (basal at 

pH 9.5 for d and e; or corresponding buffer control for f and g) represent 
mean + s.e.m. (n= 6). h, i, Inverse agonist and antagonist activity 


mean +s.d. (n= 2). 
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activity (K;, of 736 nM) at 5-HT, receptors (calcium mobilization, i). 
5!-N-ethylcarboxamidoadenosine (NECA) and 2-chloro-N°- 
cyclopentyladenosine (CCPA) served as agonist controls, while CGS15943 
is an inverse agonist control for Az, receptors. Normalized results 
represent mean + s.e.m. (1 = 3). Curves were analysed in GraphPad Prism 
with the built-in four-parameter logistic function. j, k, Lead compounds 
(10 uM) showed no agonist (j) or antagonist (k) activity at CKCR4 
receptors (cAMP production) with CXCL12 as an agonist control (1 or 

3 uM) or AMD 3100 (10 uM) as an antagonist control. Results represent 
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Extended Data Figure 6 | Primary screening and comparison of 
allosteric parameters of 13 ogerin analogues at GPR68. The 13 ogerin 
analogues (structures in Supplementary Table 9) identified from docking 
a virtual library of more than 600 ogerin derivatives were synthesized 
(Supplementary Information). a—e, Production of cAMP was measured in 
transiently transfected HEK293T cells at 10 1M and five different 

pH conditions, pH 8.4 (a); pH 7.9 (b); pH 7.4 (c); pH 7.0 (d); and pH 6.5 (e), 
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of the allosteric parameters loga and log. Proton concentration-responses 
were carried out in the absence and presence of increasing concentrations 


of ogerin and its analogues, results were analysed using a standard 
allosteric operational model to obtain allosteric parameters. Values 
represent mean + s.e.m. (# > 3; see details in Supplementary Table 8). 
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Extended Data Figure 7 | Characterization of potent GPR68 PAMs. 

a-c, Concentration-response curves of H* in the absence and presence 

of increasing concentrations of CGH2466 (a, a’), tracazolate (b, b’) and 
SLV320 (c, c’) and in the absence (left column, a, b, c) and presence (right 
column, a’, b’, c’) of phosphodiesterase inhibitor (Ro 20-1724, 30 uM) 

at GPR68-expressing cells. Normalized results (mean + s.e.m., n= 8 for 
CGH2466; n=5 for tracazolate; n = 5 for SLV320 for left column and n=3 
for right column) were analysed using a four-parameter logistic function 
and the standard allosteric operational model (not shown). Allosteric 


parameters in absence of Ro 20-1724 are summarized in Supplementary 
Table 8. For each pair of fittings, the proton potency value (negative 
logarithm of the half-maximum effective concentration (pECs9)) from 

the agonist concentration-response curve (right) in the absence of testing 
compound was used as the pK, for the allosteric operational model (left). 
d, Schematic showing the shared pharmacology among GABAa, adenosine 
GPCRs and GPR68 ligands. Molecules along each edge of the triangle have 
been shown to have activity at both targets, whereas tracazolate, in the 
middle, shows activity at all three. 
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for nonspecific activity (b); at GPR65 when receptors were activated at 


pH7.40 for modulator or antagonist activity (c); at GPR65 when receptors 
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concentration-responses (e), BTB09089 concentration-responses (f), 
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g) at GPR65 mutant 
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Extended Data Figure 11 | In vivo behavioural profiling of GPR68- 
knockout mice. a, b, No effects of GPR68 deletion on distance travelled in 
an open field. Data represent mean + s.e.m. for each group for a one-hour 
test session. c, d, No difference on latency to fall from an accelerating 
rotarod. Data represent mean + s.e.m. for each group. e-h, Decreased 
startle responses in GPR68 knockout mice after presentation of acoustic 
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stimuli (e, f). Data represent mean + s.e.m. for each group. No effects 

of genotype were found for levels of prepulse inhibition (g, h). Data 
represent mean +s.e.m. for each group (*P < 0.05). i-l, No difference at 
acquisition and reversal learning in the Morris water maze. Data represent 
mean + s.e.m. of four trials per day. Subject numbers were 9 wild-type and 
7 knockout male mice, and 12 wild-type and 11 knockout female mice. 
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Extremely metal-poor stars from the cosmic dawn 
in the bulge of the Milky Way 


L. M. Howes!, A. R. Casey’, M. Asplund}, S. C. Keller!, D. Yong!, D. M. Nataf!, R. Poleski**, K. Lind®, C. Kobayashi®, 
C. I. Owen!, M. Ness’, M. S. Bessell!, G. S. Da Costa!, B. P. Schmidt!, P. Tisserand!®, A. Udalski*, M. K. Szymanski’, 
I. Soszyniski®, G. Pietrzynski*’, K. Ulaczyk*!°, L. Wyrzykowski’, P. Pietrukowicz*, J. Skowron®, S. Kozlowski? & P. Mréz? 


The first stars are predicted to have formed within 200 million 
years after the Big Bang’, initiating the cosmic dawn. A true first 
star has not yet been discovered, although stars?‘ with tiny amounts 
of elements heavier than helium (‘metals’) have been found in the 
outer regions (‘halo’) of the Milky Way. The first stars and their 
immediate successors should, however, preferentially be found today 
in the central regions (‘bulges’) of galaxies, because they formed in 
the largest over-densities that grew gravitationally with time>°. The 
Milky Way bulge underwent a rapid chemical enrichment during 
the first 1-2 billion years’, leading to a dearth of early, metal-poor 
stars®°. Here we report observations of extremely metal-poor stars 
in the Milky Way bulge, including one star with an iron abundance 
about 10,000 times lower than the solar value without noticeable 
carbon enhancement. We confirm that most of the metal-poor 
bulge stars are on tight orbits around the Galactic Centre, rather 
than being halo stars passing through the bulge, as expected for stars 
formed at redshifts greater than 15. Their chemical compositions 
are in general similar to typical halo stars of the same metallicity 
although intriguing differences exist, including lower abundances 
of carbon. 

Stars with a low content of heavy elements have distinct spectral 
flux distributions, which are reflected in their colours. Using the 
photometric filter system on the SkyMapper telescope operated by the 
Australian National University, it is possible to identify metal-poor 
candidate stars'® in the Galactic halo* and bulge’. We have observed 
~14,000 bulge stars preselected from SkyMapper photometry using the 
AAOmega spectrograph on the Anglo-Australian Telescope (AAT), 
which enables the acquisition of 400 simultaneous stellar spectra over 
a 2-degree field of view. More than 500 stars with an iron abundance 
less than 1/100th of the solar value have been identified, making our 
survey the first to successfully target metal-poor stars in the Milky Way 
bulge. Twenty-three of these stars, targeted as the most metal-poor 
ones on the basis of the intermediate resolution spectra (Extended Data 
Table 1), were observed in June 2014 with the MIKE high-resolution 
spectrograph on the 6.5-m Magellan Clay telescope'! to enable a com- 
prehensive determination of their chemical compositions (Fig. 1). 

The stars’ effective temperatures were derived through fitting the 
observed hydrogen lines with theoretical spectra, while neutral and 
ionized iron lines provided measurements of the surface gravities and 
metallicities in the framework of 1D stellar atmosphere models” and 
non-equilibrium spectral line formation!* (Extended Data Table 2). 
All 23 stars were found to have [Fe/H] < —2.3, including nine stars 
with [Fe/H] < —3 (here [A/B] =log, ,(Ny/Ng),— log, (Ny /Np)o> 
where N, /N,z refers to the number ratio of atoms of elements A 
and B in the star (* subscript) and the Sun (© subscript)). 


The most metal-poor star, SMSS J181609.62—333218.7, has 
[Fe/H] = —3.94+0.16. The abundances of an additional 22 elements 
were determined spectroscopically, including the a-elements Mg, Si, 
Ca, and Ti, and the neutron capture elements Y, Zr, and Ba (Extended 
Data Tables 3, 4, 5). 

To confirm their bulge membership, the distances and orbits of 
the stars have been determined. Using the spectroscopic tempera- 
tures and surface gravities, and an assumed mass of 0.8M, distances 
were inferred, which in nearly all cases are consistent with them being 
located within the bulge (Fig. 2). We have measured velocities for ten 
of our stars using observations taken by the OGLE-IV survey", from 
which orbits around the Galaxy have been determined in combina- 
tion with their distances and velocities (Extended Data Table 6); the 
remaining stars fall outside the OGLE footprint while other sources of 
kinematic information are too uncertain to constrain the orbits suffi- 
ciently. Seven out of the ten stars with accurate kinematics are shown 
to have tightly bound orbits, placing them in the inner regions of the 
Milky Way (Fig. 2). In particular, using a cut-off radius of 3.43 kpc as 
the radius of the bulge component", the most metal-poor star SMSS 
J181609.62—333218.7 has an orbit entirely contained within the bulge. 
Only two out of the ten stars are on much larger orbits, being merely 
halo stars currently passing through the bulge region. Extending these 
numbers to the whole sample, we can expect ~ 14 of the 23 bulge stars 
analysed here to have orbits fully within the central regions of the Milky 
Way; with the imminent arrival of kinematic data from the Gaia satellite, 
accurate orbits for all of the bulge stars will be able to be determined. 

The very first stars are predicted to have brought about the cosmic 
dawn by forming in the centres of the largest dark matter mini-haloes, 
which subsequently accreted material to become the inner regions of 
the largest galaxies'®. The typical redshift of formation for stars in the 
bulge with [Fe/H] < —1 is z= 10, in contrast to z~5 for halo stars. Of 
the stars with [Fe/H] < —3, approximately 15% are expected to have 
formed at z> 15 (refs 5, 6). Of the ten stars with accurate orbit infor- 
mation, half of them have binding energies E,.,<—8 x 10°-*km? s~?, 
which is consistent with a formation redshift of z> 15 (ref. 5). Low 
binding energies imply that the stars have been in the Galactic potential 
well for some time and it is very unlikely they have been accreted from 
a recent dwarf spheroidal merger. Their low metallicities, orbits and 
binding energies make these stars prime candidates for being direct 
descendants of the very first stars, probing a cosmic epoch otherwise 
completely inaccessible currently. Direct age determinations of these 
ancient and extremely metal-poor bulge stars from comparison with 
stellar evolutionary tracks or radioactive U or Th dating are currently 
not possible, but asteroseismic ages could possibly be inferred with the 
extended Kepler mission or future satellites. 
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Figure 1 | Extracts of the spectrum of the lowest-metallicity star in 

our sample. a, A section of the spectrum of SMSS J181609.62—333218.7 
(black line), the most metal-poor bulge star known. In blue is the predicted 
spectrum with the inferred stellar parameters (effective temperature 

Test = 4,809 K, log(g) = 1.93 (here surface gravity g is in cgs units), 

[Fe/H] = —3.94, [Mg/Fe] = 0.20), and the red and green lines show spectra 
with all abundances scaled to + 0.15 dex, respectively. All three predicted 
spectra were created using the 1D local thermodynamic equilibrium (LTE) 
spectrum synthesis programme, MOOG”. b, The Hf line of the same 
star, compared to three synthetic spectral line profiles*” computed with 
Tet = 4,640 K (red, dash-dot), 4,800 K (purple, continuous), and 4,960 K 
(blue, dashed). 


Given their extremely low metallicities and large formation red- 
shifts, these stars are likely to have formed from gas polluted by ejecta 
from a single or at most a few supernovae of the first stellar generation. 
A chemical composition analysis has been carried out to search for 
tell-tale nucleosynthetic signatures and possible differences from halo 
stars at the same metallicities. For most elements, the chemical com- 
positions of the 23 bulge stars are consistent with typical halo stars, 
suggesting enrichment by similar supernovae in spite of the distinct 
environments and formation redshifts. Subtle differences do exist how- 
ever, most notably in terms of the carbon abundances. None of the 
23 stars have the large observed carbon enhancements that occur 
frequently in halo stars. Applying evolutionary corrections to the sur- 
face carbon abundance to counter the mixing that occurs with mat- 
erial processed by H-burning through CNO-cycling at late stages of 
the stellar lifetime’, still only one of the stars would have had a natal 
[C/Fe] > 1. In the halo, the percentage of stars that are carbon-enhanced 
increases dramatically at lower metallicities—from 27% of stars with 
[Fe/H] < —2 up to 69% with [Fe/H] < —4 (ref. 17). From the litera- 
ture data on halo stars with similar iron abundances to our stars!” the 
probability of selecting at most one carbon-enhanced star out of 23 
halo stars is only 0.2%. Carbon-enhanced stars come in two varieties: 
those with and those without large excesses of neutron-capture ele- 
ments. The former are most likely to have been formed by mass-transfer 
from a binary companion that underwent the asymptotic giant branch 
phase. Those carbon-enhanced stars with neutron-capture excesses 
occur most frequently at metallicities of [Fe/H] >—3, whereas those 
without do not appear to have binary companions, and are more com- 
mon at the very lowest metallicities. As none of our bulge stars are 
classified as having large abundances of neutron capture elements, 
the likelihood of finding one such carbon-enhanced star out of 23 is 
7% if the frequency is the same for the bulge as for the halo. A lower 
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Figure 2 | The Galactic positions and orbits of the 23 stars observed at 
high resolution. a, Surface density map of a model of the Galactic bulge 
projected onto the X-Z (top) and X-Y (bottom) planes**, where X, Y, 
and Z are Cartesian coordinates with the origin at the Galactic Centre 
and Z perpendicular to the plane of the Galaxy. Plotted over this (filled 
black circles) are the 23 stars of this study, with distance uncertainties 
shown as error bars, and a circle of radius 3.43 kpc (white: the cut-off 
radius of the inner bulge determined from 2MASS data’*). The position 
of the Sun is shown with a red diamond, at 8.5 kpc from the Galactic 
Centre. b, Projections of the orbit of the lowest metallicity star, SMSS 
J181609.62—333218.7, both in the (R, Z) plane (right), where R is the 
radial direction, and in the plane of the orbit itself (left). 


frequency of carbon-enhanced stars in the bulge relative to the halo is 
contrary to theoretical predictions; the expected dependence of the ini- 
tial mass function on the cosmic microwave background'® would result 
in a greater number of carbon-enhanced stars near the centre of the 
Galaxy. 

The most metal-poor bulge star, SMSS J181609.62—333218.7, is at 
least an order of magnitude more iron-deficient than previously found 
low-metallicity bulge stars”!°. We have not been able to detect C in 
its spectrum, instead finding only an upper limit to its C abundance 
(Extended Data Fig. 1). This makes the upper limit on its total metallicity 
[Z/H] = —3.8 (total mass fraction of Z~2.1 x 10~°), where Z rep- 
resents the sum of all metals, placing it amongst the four most 
metal-poor stars known, along with the halo star SDSS J102915+172927 
(ref. 3). The low C measured in both these stars fall below the predicted 


26 NOVEMBER 2015 | VOL 527 | NATURE | 485 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 35 T T T T 7 
2b 4 
o> &£ 4 
5 'F : 
oF 4 
iE q 
5 1 
b N F Na Al P Cl K Sc V Mn Co Cu Ga 
C O Ne Mg Si S Ar Ca Ti Cr Fe Ni Zn 
T T T T | T T T L | T T T T | T T T T T T T T | T 
qL [Fe/H] = -3.94 4 
i 1 mod p HN 40M, | 
[ hom oon 4 - — - SN15M, 
s i ee: Oem --7->+ PISN 170M, 
r \\ AY AO oe aE a 
bs H hy ‘ : iv 1 . on 
O-' l ons WY \, aM a Ps ) 4 
_ Ln 4 | n\ A \\/ Wy \ Wat < al 
7) { so i VEN PNA il 
wv La 3 | ar | i wy \ ! ow iy 4 
SB Lh A fig ye MAR Pow tle \ fl 


A af i 


Atomic number 


Figure 3 | Chemical abundances of the 23 stars observed at high 
resolution. a, The abundance ratio of carbon versus iron ([C/Fe]), with 
respect to metallicity ([Fe/H]) measured in the observed stars (filled 

red circles, red arrow for an upper limit). The dotted lines represent 

the solar abundances. Also shown for comparison are literature metal- 
poor halo giants (small black dots™*) and more metal-rich bulge stars 
(filled blue triangles*’). b, The chemical abundance pattern of SMSS 
J181609.62—333218.7, for elements X, where X is displayed at the top of 
the figure. Each determined abundance is shown as an open black star. 
These abundances are compared to three synthetic supernovae yields: a 
pair-instability supernova of 170M. (PISN; blue, short dash”), a core- 
collapse supernova of 15Mz (SN: green, long dash”’), and a hypernova of 
40M» (HN; red, solid”). Dashed grey arrows represent expected non-LTE 
corrections; solid arrows represent measurements where only an upper 
or lower limit was possible. The error bars in a and b are estimates of the 
uncertainties in our measurements, calculated as described in Methods. 


metallicity limit for formation of low-mass stars due to metal line 
cooling”. 

We have compared the detailed chemical abundance pattern of 
SMSS J181609.62—333218.7 to primordial supernovae yields?!” 
(Fig. 3). In particular, the low Mg and Ca abundance, but higher Si 
abundance, and the absence of a pronounced odd-even abundance 
pattern rule out the possibility of enrichment by a pair-instability 
supernova resulting from a primordial star of (140-250)M>. Low 
abundances of Cr and Mn and of a-elements, combined with the 
higher abundance of Co, indicate that the polluting supernova was 
most likely to have been a primordial hypernova—an extremely 
energetic kind of supernova releasing ten times the kinetic energy of 
regular core-collapse supernovae, possibly due to the forming black 
hole having larger angular momentum”? Good agreement is found 
for a 40M. hypernova; a more stringent Zn limit would further con- 
strain the mass range. Unusual abundance ratios have been found in 
small numbers of stars in the halo—4% of halo stars with low carbon 
abundances have chemical peculiarities in at least two elements**— 
but none so far appear to have been polluted by a 40M. hypernova. 
A low [a/Fe] ratio (0.14 dex; here a indicates a-elements) at such 
low [Fe/H] is consistent with an inhomogeneous enrichment from 
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such supernovae”, while stars with higher [a/Fe] formed from more 
well-mixed gas due to a longer time delay in forming the second 
generation of stars. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Observations. Photometry of the Milky Way bulge was acquired for the EMBLA 
survey” during the commissioning period of the SkyMapper telescope in 2012 and 
2013. Stars were selected from the photometry using a combination of the g, i, and 
v bandpasses, designed to give a reliable metallicity indicator’. 

Spectroscopic follow-up observations took place during 2012-14, making 
use of the AAOmega+2dF multi-object spectrograph® on the Anglo-Australian 
Telescope. With between 350 and 400 stars observed in each field, spectra of more 
than 14,000 bulge stars have been obtained. The gratings used have a spectral 
resolving power of 1,300 in the blue (370-580 nm) and 10,000 in the red 
(840-885 nm). The data were reduced using the standard 2dfdr pipeline. Stellar 
spectra were fitted using a generative model that simultaneously accounts for 
stellar parameters (by interpolating from the AMBRE grid*’), continuum, spectral 
resolution and radial velocity. 

From the first two years of spectroscopic data, more than 50 stars were identi- 

fied as having [Fe/H] < —2.5. The high-resolution spectroscopic data of 23 stars 
presented in this Letter are the result of observations using the MIKE spectrograph 
at the Magellan Clay telescope'! on 15-17 June 2014. All observations were taken 
using a slit width of 0.7”, resulting in a resolving power of 35,000 in the blue and 
31,000 in the red. The data were reduced using the CarPy data reduction pipeline™, 
before they were normalized and summed together using the SMH software**. The 
final spectra cover 330-890 nm. 
Parameter and abundance determination. The stellar parameters (Extended 
Data Table 2) were calculated iteratively, using the original parameters from 
the low-resolution spectra as initial guesses. First, effective temperatures were 
derived by fitting the wings of the Balmer Ha and Hf lines with a synthetic 
profile (Fig. 1). These profiles were created by linearly interpolating between a 
grid of synthetic spectra’’. The best lines were fitted by a x” minimization, using 
a weighted average of the two lines—weighting was double on the Hf line, due 
to predicted LTE effects being larger for Ha (ref. 34). The difference between 
the temperatures calculated for each line was on average only 26 K. The log(g), 
microturbulence &, and [Fe/H] were then derived for that temperature, by forcing 
the Fe 1 abundance to remain constant with respect to reduced equivalent width, 
and equilibrium between the Fe 1 and Fe 1 abundances. Fe 1 and Fe 11 abundances 
were measured from the equivalent widths of a maximum of 66 Fe 1 lines and 
24 Fe 11 lines (in the case of the most metal-poor star, SMSS J181609.62—333218.7, 
these numbers are reduced to 10 Fe 1 lines and 4 Fe m lines). Finally a non-LTE 
correction is applied to the Fe 1 abundance, calculated by taking the average 
of the line-by-line corrections!”. This correction forces an offset between the 
Fe 1 and Fe 11 abundances, thus replacing the initial equilibrium. This process 
is repeated until the parameters converge on a solution. Throughout we use 
the 1D MARCS model atmospheres”, and a shortened version of the Gaia- 
ESO line list, with extra lines supplemented from ref. 35 due to our wider 
wavelength coverage. The stellar abundances are referenced to the solar abun- 
dances of ref. 36. This analysis method was tested on seven halo stars from 
the literature”, and the offsets found were Tog= +28 K, log(g) = —0.2, and 
[Fe/H] = —0.08 (literature values minus our values). 

The abundances were measured using the equivalent widths of atomic lines 
(that were all on the linear part of the curve-of-growth), except in the case of C 
(measured from the C-H molecular bands at 431.3 nm and 432.3 nm) and Ba 
(synthesized in order to account for hyperfine splitting). Non-LTE corrections 
were calculated for Li (ref. 37), Na (ref. 38), Mg, and Ca, and applied to the 
individual line abundances. The literature halo abundances of Mg and Ca (ref. 
25) shown in Fig. 3 have also had a NLTE correction applied, in order to ensure 
a fair comparison. 30 upper limits were derived for some elements in those 
stars where the lines were too weak to be detected (Extended Data Table 3). 
The abundance offsets compared to the literature values averaged 0.10 + 0.19 
across those elements measured in common. Owing to wavelengths covered in 
the SkyMapper metallicity filter, it is possible that stars with extremely high C 
abundance appeared to be more metal-rich, and so were not selected. However, 
a similar study of metal-poor stars discovered in the halo with SkyMapper®” 
found the fraction of C-enhanced stars was identical to that reported in previous 
surveys". Furthermore, we followed up 14,000 stars with intermediate resolution 
spectra, and determined metallicities using those spectra. The majority of the 
stars observed had [Fe/H] + —1.0 and included some that had solar metallicities, 
so it is highly unlikely that we missed any C-enhanced extremely metal-poor 
stars in our selection. 

The systematic uncertainties in the temperature determinations were estimated 
to be +100K, and the statistical uncertainties averaged +125 K, so when these are 
combined in quadrature we conclude the total uncertainty to be +160 K. The & 
uncertainties are estimated to be £0.2, mostly due to systematics. 


The standard errors of the individual line abundances of Fe 1 and Fe 11 were 
combined in quadrature to evaluate the log(g) uncertainties. The differences 
between the [Fe/H] values when varying the temperature, surface gravity, and 
microturbulence by their respective errors were combined in quadrature with the 
standard error of the Fe 11 lines to produce the [Fe/H] uncertainties. The individual 
abundance errors were also calculated using this method, using the standard error 
of the individual abundances for the lines of that particular element. 

For SMSS J181609.62—333218.7, which has [Fe/H] = —3.94, a measurement 
of Na was not possible, owing to the 818.3 nm and 819.4 nm lines being too weak, 
and the Na D lines (588.9 nm and 589.5 nm) being partly blended with interstel- 
lar Na lines. We have derived a range of possible values for this star, taking the 
upper limit from the non-detection at 819.4nm, and the lower limit from fitting 
a Gaussian to the Na D lines, taking into account the interstellar Na. 

Distances and orbital parameters. Distances to the stars were calculated by com- 
paring the absolute and apparent bolometric magnitudes. The absolute magnitudes 
were recovered from the relation My. = M,, — 2.5log(Ly./L,,), where the lumino- 
sities are calculated using L,./L, = 4noT ‘*Mz.G/ 10'°88*) taking Mx = (0.8+ 
0.2)Mz for all stars. The apparent bolometric magnitudes are reconstructed 
from the 2MASS JHK, magnitudes (Extended Data Table 1), assuming reddening"! 
(as no more-recent reddening catalogue covers all 23 stars), via the methodology 
of ref. 42. The proper motions are based on I band images taken during the 
OGLE-IV" observations of the Galactic bulge. Relative proper motions were 
derived from multiple epochs of data for each field*?, and the uncertainties are a 
combination of statistical and systematic (for each star, the systematic uncertainty 
is estimated to be ~0.4mas yr_!). These were converted into absolute proper 
motions by adding the predicted average bulge motion for each field, calculated 
using the Besancon Galaxy model**. The orbits were calculated using the python 
package galpy*® and the galactic potential assumed in these calculations was a 
three-component Milky Way-like potential*. To model the uncertainty distribu- 
tions, we sampled 1,000 orbits using a Monte Carlo simulation, assuming a normal 
distribution for the uncertainties of the input parameters. The results of this are 
included in Extended Data Table 6. One star, SMSS J175455.52 — 380339.3, has an 
unbound E,.; and impractically large orbital parameters, suggesting that one or 
more of our input parameters need to be changed. 

Code availability. All codes used to analyse the data presented are publicly availa- 
ble. In particular, the 1D LTE analysis used was made possible with the line analysis 
and spectrum synthesis code MOOG”. 
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Extended Data Figure 1 | The C-H band of SMSS J181609.62 — 333218.7. The C-H band is used to derive an upper limit for C in our most metal-poor 
star, SMSS J181609.62 — 333218.7. Synthetic spectra with abundances of [C/Fe] = 0.06 (blue) and [C/Fe] = 0.56 (red) are shown for comparison. 
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Extended Data Table 1 | Coordinates and 2MASS photometry of the 23 stars observed 


Name (SMSS) RA(°) Dec(°) JU(°) 6(°) J (mag) H (mag) Kg (mag) 
J173823.38-145701.1 264.597 -14.950 11.1 8.7 10.85 10.22 10.03 
J182048.26-273329.2 275.201 -27.558 5.0 -6.1 12.94 12.42 12.25 
J183744.90-280831.1 279.437 -28.142 6.2 -9.7 12.29 11.69 11.33 
J183647.89-274333.1 279.200 -27.726 6.5 -9.3 10.68 10.03 9.77 
J183812.72-270746.3 279.553 -27.130 7.1 -9.3 13.38 12.79 12.61 
J183719.09-262725.0 279.330 -26.457 7.7 = =-8.9 12.79 12.19 12.03 
J184201.19-302159.6 280.505 -30.367 4.5 -11.5 14.52 14.08 14.00 
J184656.07-292351.5 281.734 -29.398 5.9 -12.0 13.12 12.61 12.51 
J181406.68-313106.1 273.528 -31.518 08 -6.6 12.12 11.56 13.30 
J181317.69-343801.9 273.324 -34.634 357.9 -7.9 13.09 12.55 12.50 
J181219.68-343726.4 273.082 -34.624 357.9 -7.7 12.80 12.28 12.15 
J181609.62-333218.7 274.040 -33.539 359.2 -7.9 13.39 12.84 12.71 
J181634.60-340342.5 274.144 -34.062 3588 -8.3 12.56 11.99 11.90 
J175544.54-392700.9 268.936 -39.450 352.0 -7.1 13.71 13.19 13.09 
J175455.52-380339.3 268.731 -38.061 353.1 -6.3 11.98 11.39 11.26 
J175746.58-384750.0 269.444 -38.797 3528 -7.2 13.09 12.60 12.51 
J181736.59-391303.3 274.402 -39.218 354.2 -108 12.06 11.54 1137 
J181505.16-385514.9 273.772 -38.921 354.2 -10.2 13.63 13.15 13.09 
J181921.64-381429.0 274.840 -38.241 355.2 -10.6 13.64 13.13 13.03 
J175722.68-411731.8 269.345 -41.292 350.5 -8.3 13.85 Tuco i ae | 
J175021.86-414627.1 267.591 -41.774 3494 -74 11.74 11.23 11.17 
J175636.59-403545.9 269.152 -40.596 351.1 -7.9 12.86 12.29 12.19 
J175433.19-411048.9 268.638 -41.180 3504 -78 11.94 11.43 11.33 


RA, right ascension; Dec., declination; / and b, Galactic longitude and latitude, respectively. 
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Extended Data Table 2 | Stellar parameters of the 23 stars observed 


Name Vhelio da Ter logg [Fe/H] & [a/Fe] 
(SMSS) (kms~') (kpc) (K) (cgs) (dex) (kms~') = (dex) 
J173823.38-145701.1 46.1 8.5 4599 0.99 -3.36 2.30 0.12 
J182048.26-273329.2 51.5 6.0 4949 2.22 -3.48 1.90 0.37 
J183744.90-280831.1 -132.6 17.6 4597 0.98 -2.92 2.05 0.33 
J183647.89-274333.1 -381.4 6.6 4649 1.17 -2.48 2.50 0.30 
J183812.72-270746.3 155.3 12.3. 4873 1.74 -3.22 1.81 -0.01 
J183719.09-262725.0 -244.7 10.0 4791 1.64 -3.18 1.81 0.32 
J184201.19-302159.6 171.8 9.6 5136 2.55 -2.84 1.96 0.30 
J184656.07-292351.5 91.0 95 4857 1.93 -2.76 1.83 0.34 
J181406.68-313106.1 4.9 9.3 4821 1.48 -2.82 1.96 0.22 
J181317.69-343801.9 139.3 6.5 5015 2.25 -2.28 1.48 0.41 
J181219.68-343726.4 -386.2 8.0 4873 1.94 -2.50 1.93 0.32 
J181609.62-333218.7 27.4 10.4 4809 1.93 -3.94 1.60 0.14 
J181634.60-340342.5 -170.3 10.5 4821 1.61 -2.46 1.79 0.06 
J175544.54-392700.9 -279.6 13.5 4857 1.83 -2.65 1.60 O32 
J175455.52-380339.3 23.5 13.5 4714 1.10 -3.36 1.80 0.08 
J175746.58-384750.0 -59.4 9.1 5064 1.96 -2.81 2.36 0.29 
J181736.59-391303.3 -177.9 15.7 4612 1.05 -2.59 2.09 32 
J181505.16-385514.9 202.1 5.0 4962 2.73 -3.29 2.10 0.35 
J181921.64-381429.0 -97.7 11.2 4917 2.02 -2.72 1.94 0.30 
J175722.68-411731.8 63.8 12.4 4894 1.97 -2.88 2.02 0.19 
J175021.86-414627.1 181.4 4.1 5015 2.12 -2.60 155 0.30 
J175636.59-403545.9 -28.8 9.8 4934 1.79 -3.21 1.96 0.20 


J175433.19-411048.9 -229.3 5.6 4912 1.91 -3.26 1.94 0.35 


Symbols: Vbelio, heliocentric velocity; do, distance from the Sun to the star; Tes, effective temperature; log(g), stellar surface gravity; &, microturbulence; [a/Fe] =([Mg/Fe] + [Ca/Fe] + [Ti/Fe])/3. Aver- 
age uncertainties: velocity, 1.0kms~!; distance, 3.0 kpc; temperature, 160K; microturbulence, 0.2 dex; log(g), 0.14 dex; [Fe/H], 0.09 dex; [a/Fe], 0.13 dex. 
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Extended Data Table 3 | Chemical abundances measured for each star, C to Ca 


Name (SMSS) A(Li) [C/Fe] [Na/Fe] [Mg/Fe] [AIl/Fe] [Si/Fe] [K/Fe] [Ca/Fe] 
J173823.38-145701.1 0.49 0.04 0.17 -0.78 O27 0.12 
J182048.26-273329.2 0.98 -0.28 0.54 -0.63 0.96 0.30 
J183744.90-280831.1 0.16 -0.20 -0.28 0.44 -0.52 0.58 0.36 0.25 
J183647.89-274333.1 -0.47 -0.24 0.33 -0.66 0.51 0.18 
J183812.72-270746.3 0.93 0.22 -0.39 0.05 -1.23 0.14 0.03 
J183719.09-262725.0 0.40 -0.19 0.47 -0.77 0.36 0.41 0.25 
J184201.19-302159.6 0.34 -0.38 0.26 -0.89 0.38 0.53 0.37 
J184656.07-292351.5 1.04 0.08 -0.30 0.41 -0.95 0.36 0.58 0.28 
J181406.68-313106.1 -0.51 0.18 0.23 -0.94 0.32 0.16 
J181317.69-343801.9 1.05 0.17 -0.33 0.53 -0.82 0.33 0.63 0.34 
J181219.68-343726.4 1.01 0.19 -0.22 0.30 -0.86 0.25 0.31 
J181609.62-333218.7 <0.06 -0.01<0.91° 0.20 -1.08 0.54 0.00 
J181634.60-340342.5 -0.10 -0.53 0.05 -1.08 0.11 0.21 0.03 
J175544.54-392700.9 0.87 0.12 -0.32 0.29 -0.88 0.34 0.36 0.29 
J175455.52-380339.3 -0.64 0.06 -0.88 0.30 0.03 
J175746.58-384750.0 -0.04 0.37 -1.10 0.44 0.24 
J181736.59-391303.3 -0.28 -0.11 0.38 -0.69 0.53 0.53 0.26 
J181505.16-385514.9 0.23 -0.23 0.21 -0.96 0.15 0.36 
J181921.64-381429.0 1.04 0.32 -0.24 0.28 -0.82 0.54 0.44 0.26 
J175722.68-411731.8 0.42 -0.42 0.21 -0.70 0.48 0.14 0.13 
J175021.86-414627.1 0.98 0.23 -0.37 0.30 -0.82 0.42 0.28 
J175636.59-403545.9 0.93 0.65 0.30 -0.76 0.45 0.11 
J175433.19-411048.9 0.92 0.24 -0.03 0.40 -0.74 0.43 0.44 0.32 


A(Li) is the logarithmic abundance of lithium. All abundances are derived using LTE, except for Li, Na, Mg, and Ca, where non-LTE corrections have been applied. Average uncertainties: Li, 0.20; C, 0.25; 
Na, 0.20; Mg, 0.16; Al, 0.22; Si, 0.21; K, 0.17; Ca, 0.12. 
0.01 is the lower limit, and 0.91 is the upper limit; see Methods for details. 
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Extended Data Table 4 | Chemical abundances measured for each star, Sc to Cu 


Name (SMSS) [Sc/Fe] [Ti/Fe] [Cr/Fe] [Mn/Fe] [Co/Fe] [Ni/Fe] [Cu/Fe] 
J173823.38-145701.1 -0.09 -0.22 -0.80 G.1F -0.21 <0.96 
J182048.26-273329.2 0.16 -0.51 -0.97 0.24 -0.33 <1.33 
J183744.90-280831.1 0.04 0.20 -0.23 -0.38 0.36 0.14 <0.29 
J183647.89-274333.1 0.14 0.34 -0.27 -0.35 0.01 0.02 -0.43 
J183812.72-270746.3 -0.20 -0.51 -0.32 0.22 -0.08 <1.06 
J183719.09-262725.0 0.18 0.14 -0.33 -0.34 0.23 0.23 <1.10 
J184201.19-302159.6 -0.03 0.20 -0.24 -0.57 0.35 -0.02 <0. ¢2 
J184656.07-292351.5 0.11 0.28 -0.19 -0.31 0.11 0.07 <0.45 
J181406.68-313106.1 0.08 0.19 -0.30 -0.60 0.22 0.09 <0.50 
J181317.69-343801.9 0.11 0.34 -0.19 -0.08 0.12 0.10 <IL13 
J181219.68-343726.4 0.18 0.31 -0.15 -0.14 0.22 O22 <0.17 
J181609.62-333218.7 0.13 -0.65 -1.28 O13 -0.11 <L57 
J181634.60-340342.5 -0.26 0.04 -0.24 -0.35 -0.20 -0.03 <-0.05 
J175544.54-392700.9 -0.05 0.32 -0.22 -0.18 0.24 -0.05 <0.32 
J175455.52-380339.3 0.02 -0.47 -1.05 0.07 -0.12 <0.85 
J175746.58-384750.0 0.12 0.19 -0.44 -0.59 0.09 -0.01 <0.79 
J181736.59-391303.3 0.15 0.24 -0.31 -0.34 0.00 0.08 <0.10 
J181505.16-385514.9 0.40 0.38 -0.54 -0.81 0.17 -0.11 <12 
J181921.64-381429.0 0.01 0.32 -0.17 -0.44 0.33 0.25 <0.50 
J175722.68-411731.8 -0.22 0.15 -0.31 -0.48 -0.04 0.04 <0.82 
J175021.86-414627.1 -0.23 0.26 -0.27 -0.31 0.16 0.16 <0.27 
J175636.59-403545.9 -0.28 0.06 -0.29 -0.72 0.17 -0.26 <0.95 


J175433.19-411048.9 0.22 -0.52 -0.91 0.29 0.02 <1.08 


All abundances are derived using LTE. Average uncertainties: Sc, 0.10; Ti, 0.10; Cr, 0.21; Mn, 0.25; Co, 0.23; Ni, 0.19; Cu, 0.25. 
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Extended Data Table 5 | Chemical abundances measured for each star, Zn to Eu 
Name (SMSS) [Zn/Fe] [Sr/Fe] [Y/Fe] [Zr/Fe] [Ba/Fe] [La/Fe] [Eu/Fe] 
J173823.38-145701.1 0.66 0.03 0.02 0.23 -0.04 -0.10 


J182048.26-273329.2 <1.01 -0.47 0.03 <1.33 
J183744.90-280831.1 0.27 -0.29 -0.32 0.03 -0.31 <0.12 
J183647.89-274333.1 0.23 0.18 -0.20 0.45 O13 0.17 0.82 
J183812.72-270746.3 <0.79 -1.03 -0.70 <0.77 
J183719.09-262725.0 0.48 0.04 0.53 -0.51 <1.03 
J184201.19-302159.6 <1.15 -0.20 0.04 0.70 0.16 <0.94 
J184656.07-292351.5 0.48 -0.26 -0.51 0.14 -0.32 <0.36 
J181406.68-313106.1 0.42 -1.61 -0.72 <0.32 
J181317.69-343801.9 0.17 Gly 0.11 0.34 0.22 -0.09 0.15 
J181219.68-343726.4 0.33 -0.06 -0.11 0.09 0.13. <0.90 0.48 
J181609.62-333218.7 <1.40 -0.85 0.23 <-0.66 <1.09 0.91 
J181634.60-340342.5 0.21 -0.25 -0.65 -0.21 -0.32 -0.14 -0.11 
J175544.54-392700.9 0.36 -0.10 -0.2 0.38 -0.11 -0.15 0.21 
J175455.52-380339.3 0.63 0.47 0.01 0.14 -0.57 <0.66 
J175746.58-384750.0 -0.21 0.04 0.91 C23 -<1,20 0.65 
J181736.59-391303.3 0.23 -0.14 = -0.47 0.14 -0.28 8 <1.19 0.21 
J181505.16-385514.9 <0.95 -0.19 0.14 0.71 0.04 <0.54 0.96 
J181921.64-381429.0 0.42 -0.21 -0.14 0.51 -0.01 0.48 0.59 
J175722.68-411731.8 <0.95 -0.30 -0.30 0.24 -0.19 <0.63 0.52 
J175021.86-414627.1 0.41 -0.14 = -0.40 0.25 -0.08 <0.60 0.23 
J175636.59-403545.9 0.86 0.55 0.24 ue al -0.95 <1.21 
J175433.19-411048.9 0.50 -0.81 -0.29 -0.42 <0.91 


All abundances are derived using LTE. Average uncertainties: Zn, 0.10; Sr, 0.20; Y, 0.12; Zr, 0.12; Ba, 0.17; La, 0.15; Eu, 0.16. 
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Extended Data Table 6 | Orbital parameters 


Name (SMSS) La cos 6 Ls Mean rperi Mean rap Mean Mean Zmaz  Etot 
(masyr—!) (masyr~!) (kpc) (kpc) Eccentricity (kpc) (104 km? s?) 

J182048.26-273329.2 -4.10+0.52 -6.38+0.51 0.5 33 ras Bae | Ons. aos Bile ele 
J184201.19-302159.6 -0.38+0.90 -0.82+0.90 127) 66772 72s Ae de uy 
J184656.07-292351.5 1.174089 -2.32+0.89 1.2752 4g 70 065707, 822555 Si a a 
J181406.68-313106.1 2.28+0.52 -8.25 40.52 1.1 +33 sa, O63 2718 -9.5 +38 
J181219.68-343726.4 -2.42 41.14 -1.2941.14 07 +°8 he arene oot,  FT2aes; 3474 
J181609.62-333218.7 -4.14 +0.64 -3.74+0.64 107433 34S O57. Lots oo Bee 
J181634.60-340342.5 1.92 +0.62 -0.31 40.62 1.9 +?° ca ae OCs oe ty ee 
J175544.54-392700.9 0.03 +1.49 -0.35 41.46 1.7 +73 eae BLT yee soe: case 
J175455.52-380339.3  1.9841.14 4.764114 55748 is oe |6ele 66a 
J175746.58-384750.0 1.86+1.25 0.1741.25 1.8 +19 Sa ey (eo - Sa a A ace 


Symbols: z,cosé and ps are the proper motions in equatorial coordinates; rperi and rap are the pericentric and apocentric radii of the orbit, respectively, Zmax is the maximum distance the orbit reaches 
above/below the Galactic plane, and Ejo¢ is the total energy of the orbit. All values given here are the mean values from the Monte Carlo simulation of 1,000 orbits. 
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Ubiquitous time variability of integrated stellar 


populations 


Charlie Conroy!, Pieter G. van Dokkum? & Jieun Choi! 


Long-period variable stars arise in the final stages of the 
asymptotic giant branch phase of stellar evolution. They have 
periods of up to about 1,000 days and amplitudes that can 
exceed a factor of three in the I-band flux. These stars pulsate 
predominantly in their fundamental mode’, which is a function 
of mass and radius, and so the pulsation periods are sensitive to 
the age of the underlying stellar population*. The overall number 
of long-period variables in a population is directly related to their 
lifetimes, which is difficult to predict from first principles because 
of uncertainties associated with stellar mass-loss and convective 
mixing. The time variability of these stars has not previously 
been taken into account when modelling the spectral energy 
distributions of galaxies. Here we construct time-dependent stellar 
population models that include the effects of long-period variable 
stars, and report the ubiquitous detection of this expected ‘pixel 
shimmer’ in the massive metal-rich galaxy M87. The pixel light 
curves display a variety of behaviours. The observed variation of 
0.1 to 1 per cent is very well matched to the predictions of our 
models. The data provide a strong constraint on the properties of 
variable stars in an old and metal-rich stellar population, and we 
infer that the lifetime of long-period variables in M87 is shorter 
by approximately 30 per cent compared to predictions from the 
latest stellar evolution models. 

In typical massive galaxies with ~10"! stars, the variation in the 
total light due to long-period variables will be small, as the summed 
light curves of many such stars effectively cancel each other out (with 
random phases, the net effect scales as N~'/”, where N is the number 
of stars). If the light is spread out over many (for example, ~10*-10°) 
pixels, then the number of stars per pixel can range from ~10* to 
10’ and in this regime the number of asymptotic giant branch stars 
per pixel is small and governed by Poisson statistics. This ‘semi- 
resolved’ regime is well known’, and the expected surface brightness 
fluctuations due to Poisson statistics of rare luminous stars have 
been observed and studied in several hundred nearby galaxies**. 
We expect in this regime to be able to detect the presence of variable 
stars through the time dependence of the pixel flux (that is, the pixel 
light curve): essentially, every pixel is expected to ‘shimmer’ on time- 
scales of several hundred days. 

In order to quantify the expected pixel shimmer, we created a stellar 
population model at solar metallicity that included the time-dependent 
effect of long-period variables. We started with a new library of 
stellar isochrones (J.C. et al., submitted) that densely samples fast 
phases of stellar evolution, and assigned periods to evolved stars 
assuming that they pulsate in the fundamental mode‘. We then used 
observations of variable stars in the Galactic bulge from the OGLE 
survey to estimate a period-amplitude relation in the I band?". A 
smooth surface brightness model of the giant elliptical galaxy M87 
was used to specify the luminosity within pixels of size 0.2” x 0.2”. 
The pixel luminosity was used to normalize the weights in the iso- 
chrone assuming a Salpeter initial mass function!’. For each pixel 
the number of giants was drawn from a Poisson distribution, and the 


time evolution of the flux for each giant was given by its associated 
period and amplitude and initialized with a random phase. An illus- 
tration of this time-dependent model for M87 is shown in Fig. 1. The 
variable-star part of our model has a tunable parameter, the long- 
period variable star weight, which can be interpreted as the typical 
lifetime of such stars. Further details regarding the modelling are 
provided in Methods. 

We sampled the model with the same cadence and applied the same 
photon counting uncertainties as used for existing observations of 
M87 (see below). The resulting model pixel light curves are shown 
in Fig. 2 (blue lines). These pixels were selected to have peak-to-peak 
flux variation >1.5%. While rising and falling curves are clearly seen, 
one also sees that a <100-d observing window can by chance sample 
a light curve at a phase that appears relatively flat. A >200-d observing 
cadence would clearly be ideal for observing the effects of long-period 
variables in the integrated light of nearby galaxies. 

To test these expected variations, we analysed archival data of the 
galaxy M87 from the Hubble Space Telescope (HST) collected over 
72 d in 2005'*-'’. Imaging was obtained in both the F606W and 
F814W filters with the Advanced Camera for Surveys. We focused our 
analysis on the F814W imaging as the F606W data were generally of 
lower quality (owing both to a shorter exposure time and the fact that 
only a single exposure was obtained per visit, which made it difficult to 
clean the images of blemishes such as hot pixels and cosmic rays). The 
data were processed via the standard HST pipeline. In total 52 separate 
images, each with a depth of 1,440 s, were considered in this analysis. 
Globular clusters in the field were used to refine the astrometric align- 
ment with subpixel shifts. Accurate subtraction of the background 
was achieved with several additional corrections to the standard HST 
pipeline, as detailed in Methods. Pixels that deviated by more than 


Od 20d 40d 60d 80d 100d 120d 


Figure 1 | Illustration of pixel shimmer. Model prediction of the effect 
of long-period variables on integrated light. a, A smooth model for the 
surface brightness profile of M87. b, The flux at time t= 0 divided by 
the mean flux over 1,000 d within each pixel. c, Zoom-in on the lower 
left corner (boxed in b), showing snapshots at 20-d intervals. Notice the 
coherent variation in brightness of individual pixels. 
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Figure 2 | Simulation of pixel light curves. a—e, Each panel shows 
modelled relative flux variation over 200 d for a randomly chosen pixel 
selected to have a peak-to-peak flux variation of >1.5%. The underlying 
stellar population is old and metal-rich, and includes long-period 
variable stars. In each panel, the noise-free model (dashed blue line) is 
compared to a simulation of the M87 data, including the photon counting 
noise (lo errors) and cadence of the observations (filled black circles and 
error bars), and a boxcar average of the simulated data (red lines and lo 
error bars). Also shown in each panel is Ni py, the number of long-period 
variable stars per pixel with periods >150 d. 


30% from a smooth model of the light profile were masked. This effec- 
tively removed all visible globular clusters, background galaxies, the 
chip gap, and edge effects. We also masked the central region and the 
well-known jet in M87. The data were binned 4 x 4 to 0.2” x 0.2” in 
order to reduce the spatial coherency imposed by the point-spread 
function (PSF; note that the models were also spatially binned and 
have been convolved by the PSF in order to emulate the observations 
as closely as possible). 

Example pixel light curves for M87 are shown in Fig. 3. The error 
bars represent photon counting uncertainties only; the solid line is a 
5-point boxcar averaging of the data. We detect coherent variation in 
the pixel light curves that is qualitatively consistent with our model 
expectations. These examples were chosen to highlight the level of 
variety seen in the data. We note that a significant fraction of pixel 
light curves show no evidence of variation within the noise limits of 
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the data. We show below that this is expected if long-period variables 
are the source of the variation. 

We have quantified the pixel light curves by fitting each curve with 
a linear function; the best-fit slope and uncertainty were recorded. 
The resulting distribution of slopes (in units of the uncertainty) is 
shown in Fig. 4. We find that 24% of pixels (48,100 out of 202,000) 
show >2o evidence for variation. In our model there are on average 
1.5 long-period variable stars responsible for variation in each >20 
detection. This implies a statistical detection of ~72,000 variable 
stars in M87. When averaged over the central 1’ x 1’ field of view, 
the model predicts on average 0.5 variable stars per pixel. In Fig. 4 
we compare the observations to the model predictions. We show 
the sensitivity of the model light curve statistics to both the stellar 
population age and variable star parameters, and also the posterior 
probability distributions that result from fitting the model to the 
observed histogram when allowing the age and relative variable star 
weight to vary (we do not include the tails of the distribution in the 
fit as the data are slightly asymmetric beyond |slope/error| +7). The 
variable star weight is an overall factor controlling the contribution 
of variable stars to the integrated light relative to the predictions of 
a stellar evolution model (see Methods for details). The pixel light 
curves provide a strong constraint on a combination of the age and 
variable star weight. The dashed line in Fig. 4d shows the best-fit 
age estimated from modelling the integrated light spectrum of the 
central region of M87’°, which allows us to break the degeneracy 
between age and variable star weight. It is noteworthy that the best-fit 
long period variable star weight is less than one, suggesting that such 
stars in M87 may have shorter lifetimes than current solar-metallicity 
stellar evolution models predict (see Methods for a discussion of the 
effects of metallicity). 

This is not the first detection of time variability in the pixel 
fluxes of nearby galaxies; previous work predicted the occurrence 
of a gravitational microlensing signal at the pixel level'’, which was 
subsequently observed'’. Novae have also been observed in nearby 
galaxies'®, and indeed we identified ~15 novae through visual 
inspection of the pixel light curves for M87. However, novae and 
microlensing events are rare (though bright) events. An important 
distinguishing feature of the time variation caused by long-period 
variable stars is the ubiquity—as Fig. 4 shows, 24% of the pixels show 
>2¢ evidence for variation. 

There are relatively few constraints on the stellar evolutionary 
phase that gives rise to long-period variables. The best constraints to 
date on this phase are confined to the Magellanic Clouds, which have 
sub-solar metallicities characteristic of low-mass galaxies. The obser- 
vations reported here have provided a direct constraint on this impor- 
tant stellar evolutionary phase in a massive, high-metallicity galaxy. 
New stellar evolution models over-predict the lifetimes of long-period 
variables by approximately 30% if a spectroscopic age for M87 of 
10 Gyr is adopted. An older mean population age would reduce the 
mild tension between the models and observations. Constraints such 
as these on highly evolved, luminous stars are essential for interpret- 
ing light from more distant, massive and metal-rich galaxies across 
the Universe. 

The detection of time variation in the integrated light of nearby 
galaxies opens the way to deriving stellar population ages in these 
systems by a completely different approach from conventional tech- 
niques. In the future, one could imagine high cadence observations 
of nearby galaxies on >100-d baselines being performed to detect the 
period distribution of long-period variables by analysis of the power 
spectra of the time series data. This technique is not limited to old 
stellar systems; on the contrary, younger systems would show con- 
siderably greater temporal variation. For example, on the basis of our 
models, we expect that 4%, 14%, and 22% of pixels with 10° stars would 
show >1% absolute flux changes over 100 d for ages of 10!°, 10°, and 
10° yr, respectively. The larger effect at younger ages is due primarily 
to the larger fractional contribution of long-period variables to the 
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Figure 3 | Observed pixel light curves for M87. a-f, Each panel shows 
observed relative flux variation over 72 d for a different pixel in M87 
(filled black circles). These pixels were selected to highlight the variety 


of morphology of the light curves, including rising, falling, periodic, and 
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total light (see Methods for details). It would therefore be relatively 
straightforward to perform similar studies on nearby spiral galaxies, 
where the signal would be much stronger. At a basic and fundamental 
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peculiar curves. We unambiguously detect the ‘pixel shimmer’ due to the 
contribution of long-period variables to the integrated light. Errors represent 
1o photon counting uncertainties. Red lines and error bars are 5-point 


boxcar averages of the data. 
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level, each pixel of an observed galaxy varies measurably in time, and 
this variation encodes unique information on its underlying stellar 
population. 
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Figure 4 | Statistics of the pixel 
light curves. a—c, Normalized 
distributions of the best-fit linear 
slope of the pixel light curve in 

units of the 1o uncertainty on the 
slope (slope/error). The data (black 
line in a) are compared to several 
models, including a variable-star- 
free model (labelled ‘noise in a), and 
models varying the age (b), variable 
star amplitude (c), and weight (c). 
Models with varying age and variable 
star weight (‘LPV weight’) were fitted 
to the observed histogram, and the 
lo and 2c confidence limits on these 


parameters are shown in d (black 
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and red lines, respectively). The best- 
fit model is shown as a red line in a-c 
and a black cross in d. The vertical 
dashed line in d indicates the best-fit 
age from fitting the integrated light 
spectrum. In c is shown the effect of 
doubling the amplitudes of the long- 
period variables (A x 2; blue line) 
and the weights (W x 2; green line). 
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METHODS 


Data reduction and tests. Owing to the small expected amplitude of the time- 
dependent flux signal for M87, great care was taken to control systematic effects. 
In this section we describe the details of the data reduction procedure and the 
additional corrections that were applied to the images. 

We began with the publicly available HST images, in which the four dithered 
exposures per visit were combined and astrometrically aligned, resampled by the 
drizzling process, and cosmic rays were removed. The public images include flat 
field corrections and a standard sky subtraction. We used five globular clusters to 
refine the astrometric alignment via subpixel shifts (using bilinear interpolation). 
The mean shift was 0.25 pixels (in both x and y directions) in the unbinned images. 
All of our analysis was performed on images binned 4 x 4, so these shifts are a tiny 
fraction of the final pixel size. 

Owing to the large angular extent of M87, the standard ACS pipeline is not able 
to accurately measure the true sky background. We therefore applied a correction 
to the sky subtraction. We assumed that the true M87 surface brightness profile is 
that reported by Kormendy”, which was derived by combining a variety of space 
and ground-based data. Using this profile, we estimated the sky background in our 
ACS images by minimizing the residuals between the ACS data and Kormendy’s 
profile with the sky background, normalization, and a linear colour gradient as free 
parameters (the last is to account for differences between our F814W filter and 
Kormendy’s V-band profile). This was done separately for each of the 52 images. 
We refer to this as the primary sky background correction. 

In order to test the fidelity of the images over the 72 d, we selected three back- 
ground galaxies and measured their fluxes within an 8-pixel aperture. These 
galaxies should show no detectable temporal variation. The resulting temporal 
variation of the total flux from these galaxies is shown in Extended Data Fig. la. 
There are no obvious time-dependent trends. However the scatter is 0.5%, which is 
relatively large compared to the signal of interest (of the order of 1%). We therefore 
made several additional modifications to the sky background levels in an effort 
to reduce the scatter. 

We identified three additional background galaxies (that is, not the ones used 
to measure the flux variation in Extended Data Fig. 1) and measured their flux 
variation over the duration of the observations. Under the assumption that these 
sources should have no intrinsic flux variation, we determined a sky background 
correction necessary to bring the flux of the background galaxies to a constant. The 
average correction determined this way was 0.002 counts s_'. At this point in the 
analysis, the distribution of pixel light curve slopes showed a slight preference for 
positive slopes (the mean slope/error was +-0.5). Under the assumption that the 
true distribution should have a mean of zero, we subtracted a linearly varying sky 
background component (which scaled as 5 x 10-°¢). These two corrections yield 
a distribution of pixel light curve slopes with zero mean (by construction), and a 
temporal flux variation in the three reference background galaxies with a scatter 
of 0.2% as shown in Extended Data Fig. 1b. Moreover, a 5-point boxcar average 
of the light curve of the background galaxies shows flux variation at the $0.1% 
level. From this test we conclude that it should be possible to measure intrinsic flux 
variation at the sub-percent level, at least for pixels where photon counting noise 
is not the dominant source of uncertainty. 

We emphasize that the additional sky background corrections discussed above 
do not materially change our conclusions. While these corrections result in a shift 
in the histogram of slope/error values, they have no effect on the width of the dis- 
tribution. Moreover, the example pixel light curves shown in Fig. 2 are unchanged 
within their 1c error bars. 

Approximately half of the exposures were obtained at a detector location offset 
by 60-70 pixels compared to the other half of the exposures. This provides a further 
test that the trends shown in Fig. 3 and the statistics in Fig. 4 are not dominated 
by unknown systematics at the level of the detector; if they were, one would have 
expected to see flux variation that correlated with the dither pattern, but such 
correlations, if present, are within the noise limits of the data. 

As a final test of both the data reduction and our results, in Extended Data 
Fig. 2 we show histograms of the flux variation over the 72-d observing window. 
The flux variation was computed by temporally binning the exposures by five to 
reduce the Poisson noise in the measurement and computing (maximum —mini- 
mum)/mean flux at each pixel. The results are shown for three bins of pixel fluxes 
(the legends show the cuts in units of counts per second and the total number 
of pixels per bin). The data (black lines) are compared to our best-fit model as 
derived from fitting the pixel light curve slopes (red lines) and a model without 
long-period variables (blue lines). The good agreement between the model and 
data in all three panels is a strong indication that our measurements are reliable, 
as the panels probe a factor of 40 in dynamic range in pixel fluxes. Systematic 
issues with, for example, the sky subtraction would show up most strongly in 
the pixels with low count rates, and yet the observations and models agree very 


well in that regime. Moreover, this flux variation metric is model independent 
and so the difference between the variable-star-free model and the observations 
provides further strong support that the variation detected in the observations 
is real and not an artefact of some unknown systematics. There do exist subtle 
differences between the model and data that vary as a function of the pixel flux, 
but this could be due to changes in the underlying stellar populations as the pixels 
with low fluxes are in the outskirts, where the ages and metallicities of the stars 
are expected to differ from the central regions. 

Modelling long-period variables. Here we provide additional details regarding the 
incorporation of long-period variables in the stellar population synthesis model- 
ling. We start with stellar isochrones that include all relevant evolutionary phases, 
including thermally pulsating asymptotic giant branch (AGB) stars. We include 
a model for circumstellar dust around these stars, which results in dimmer stars 
especially for the most intrinsically luminous and evolved stars”!. Periods (in days) 
are assigned according to the following equation: 


logP = —2.07log(R/R,) — 0.9log(M/M.) (1) 


which assumes that the stars pulsate in their fundamental mode’. Next, we require 
a relation between pulsation period and amplitude. This relation is shown in 
Extended Data Fig. 3 for stars in the Galactic bulge from OGLE data'!. Symbols 
are colour-coded according to the type of pulsator. The dashed lines are the adopted 
period-amplitude (P-A) sequences: 


logA = 0.5logP — 1.25 (2) 
for the Mira sequence (logP > 2.2), and: 
logA = 2logP — 5 (3) 


for the semi-regular variable (SRV) sequence (1.0 < logP < 2.2). In the equations 
above, P is in days and the amplitude A is in the I band in magnitudes. We note 
that the SRV sequence is included for completeness but has a very small effect on 
the model predictions. 

The equations above, along with the initial mass function weights determined 
by the masses of the AGB stars in the isochrones, completely specify our default 
variable-star model. In order to convert fluxes to luminosities, we have assumed 
a distance to M87 of 16.7 Mpc (ref. 7). In order to explore the constraining power 
of the data, we considered variation in both the amplitude of the long-period var- 
iables, implemented as an overall scaling of all the amplitudes by the same factor, 
and the weight given to the variable stars in the population synthesis. The latter 
can be interpreted as a change to the typical lifetime. 

We have taken great care to ensure that the long-period variable phase is well- 
resolved in the isochrone tables. The isochrones were constructed from 185 indi- 
vidual mass models and with 600 equivalent evolutionary points in the thermally 
pulsating AGB phase alone. We have run a variety of tests to ensure that our 
model predictions are ‘converged’; for example, we have created models with fewer 
evolutionary points and fewer input mass models and the resulting predictions 
are very similar. For context, at 10 Gyr our isochrones contain 350 points on 
the AGB with periods >200 d, while the publicly available Padova” isochrones 
contain only 3 such points. 

Extended Data Figure 4 quantifies the fractional contribution of long-period 
variables to the total flux of a stellar population as a function of wavelength, 
age, and metallicity ([Z/H]). The flux contribution peaks in the age range of 
10°°-10° yr and increases towards redder bands. The trend with wavelength is a 
reflection of the fact that variable stars are cool and so emit most of their light in 
the near-infrared. We caution that the wavelength-dependence shown here does 
not directly translate into the wavelength-dependence of the time-dependent 
signal because the period-amplitude relation also depends on wavelength. As 
the pulsation directly affects the radius and hence the temperature, for these cool 
stars one expects and indeed observes that the amplitudes are larger in the bluer 
wavebands”*. The metallicity-dependence is relatively modest, at least over the 
range [Z/H] = —0.3 to [Z/H] = +0.3, typical of massive galaxies. It is difficult to 
provide a simple explanation of the model metallicity variation, as it depends not 
only on the variable-star lifetime, luminosity, and temperature, but also on the 
properties of the underlying stellar population. 

We do not expect metallicity to play a critical role in the interpretation of the 
observations for several reasons. First, as noted in the previous paragraph, the 
models suggest a relatively weak metallicity-dependence of the long-period vari- 
able flux contribution. Second, M87 harbours a metallicity gradient™, extending 
from slightly super-solar in the inner R,/8 to slightly sub-solar at R., where R, is 
the effective radius. Despite this metallicity gradient, our best-fit model provides 
an equally good fit to the pixel shimmer statistics in both the central region and 
the outskirts, as shown in Extended Data Fig. 2. 
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We note here that individual long-period variables have been detected in 

nearby galaxies including the Magellanic Clouds”, M31°°, and M327’. The most 
distant galaxy with secure detections of individual long-period variable stars is 
NGC 5128”%, and in this case the observations were confined to the outskirts where 
the stellar density was sufficiently low to permit the separation of the brightest 
evolved stars from the background sea of lower luminosity stars. These observa- 
tions of individual long-period variables should provide very useful constraints 
on the modelling of such stars, and we intend to make use of these constraints in 
future work. 
Trends with radius. The HST field of view covers the central 3.3’ x 3.3’ of M87, 
of which the inner ~1’ x 1’ has a signal-to-noise ratio S/N2Z 100 per pixel for the 
observations that were analysed herein. Kormendy” reports an effective radius 
of R.=3.2' so the region of the images with high S/N covers the inner ~0.3R.. 
Extended Data Figure 5 shows several important quantities as a function of R/R. 
for our best-fit model of M87. Extended Data Figure 5a shows the stellar mass per 
pixel for the underlying smooth stellar distribution. Extended Data Figure 5b 
shows the fraction of pixels with |slope/error| > 2. In the main text we reported 
that 24% of pixels reach this criterion, and in fact that percentage remains approx- 
imately constant with radius. The constancy is the result of two opposing effects: 
at larger radius the number of stars per pixel is lower, which implies a larger 
variable-star signal. The effect on the slope scales approximately as /N (here N 
is the number of stars per pixel) as multiple variable stars with random phases 
will cancel each other out in a central-limit-theorem-like process. However, at 
larger radius the S/N is lower, and for a fixed exposure time this also scales as 
/N. Thus, for a fixed exposure time, the detectability of long-period variables is 
fairly constant with radius. 

Extended Data Figure 5c and d show the model trends with radius for a noise- 
free model (infinite S/N). In this case it is clear that the absolute effect of long- 
period variables is larger at larger radius. Extended Data Figure 5c shows the 
fraction of pixels with >1% peak-to-peak flux variation over 200 d. Old stellar 
populations with a pixel mass <10°M,, yield >1% flux variation in ~10% of the 
pixels. Extended Data Figure 5d compares the surface brightness fluctuation (SBF) 
amplitude at a single epoch to the mean temporal variation over a 200-d baseline; 
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the latter is smaller than the former by a factor of ~5. The SBF amplitude is com- 
puted as the standard deviation of the model flux divided by a smooth model for 
the flux. 

We close by noting that while the overall effect of long-period variables on the 
integrated light is relatively modest at old stellar ages, it is much more prominent 
for younger stellar populations, for example, in the 10°-10° yr range. Future work 
devoted to younger stellar populations will therefore probably uncover a rich array 
of observational signatures of time variable stellar populations. 

Code availability. We have opted not to make the code used in this manuscript 
available because the data reduction and analysis is fairly straightforward and can 
be easily reproduced following the methods described herein. 
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Extended Data Figure 1 | Flux of background galaxies. Shown is the average. a, Flux variation after the standard data reduction including the 
time variation of the flux of three background galaxies. The background primary sky background correction. The arrow indicates a point that lies 
galaxies should show no intrinsic time variation in their flux and therefore at —2.1. b, Flux variation after additional corrections were applied to the 
serve as a test of the stability of the data. The mean (u) and standard sky background levels. These additional corrections allow us to achieve a 


deviation (c) are reported in each panel. The loerror oneach point dueto stability of ~0.1% for boxcar-averaged time series data. 
photon counting uncertainty is 0.09%. The solid line is a 5-point boxcar 
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variable stars. Data are for Galactic bulge stars from the OGLE survey"! Miras and SRVs; these relations are used to assign pulsation amplitudes in 
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variables (SRVs), and OGLE small-amplitude red giants (OSARGs) are 
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1 (0.8m), z (0.9 um), J (1.2 um), and K (2.4m). The flux contribution 
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Extended Data Figure 5 | Radial variation of model properties for M87.__ peak-to-peak flux variation over 200 d. d, Strength of surface brightness 
a, Stellar mass per pixel for the smooth underlying model for M87 as a fluctuation (SBF) signal at a single epoch compared to the mean temporal 
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Collisionless encounters and the origin of the 


lunar inclination 


Kaveh Pahlevan! & Alessandro Morbidelli! 


The Moon is generally thought to have formed from the debris 
ejected by the impact of a planet-sized object with the proto-Earth 
towards the end of planetary accretion’. Models of the impact 
process predict that the lunar material was disaggregated into a 
circumplanetary disk and that lunar accretion subsequently placed 
the Moon in a near-equatorial orbit®-®. Forward integration of the 
lunar orbit from this initial state predicts a modern inclination 
at least an order of magnitude smaller than the lunar value—a 
long-standing discrepancy known as the lunar inclination 
problem’~°. Here we show that the modern lunar orbit provides a 
sensitive record of gravitational interactions with Earth-crossing 
planetesimals that were not yet accreted at the time of the Moon- 
forming event. The currently observed lunar orbit can naturally 
be reproduced via interaction with a small quantity of mass 
(corresponding to 0.0075-0.015 Earth masses eventually accreted 
to the Earth) carried by a few bodies, consistent with the constraints 
and models of late accretion!®". Although the encounter process 
has a stochastic element, the observed value of the lunar inclination 
is among the most likely outcomes for a wide range of parameters. 
The excitation of the lunar orbit is most readily reproduced via 
collisionless encounters of planetesimals with the Earth-Moon 
system with strong dissipation of tidal energy on the early Earth. 
This mechanism obviates the need for previously proposed (but 
idealized) excitation mechanisms!”", places the Moon-forming 
event in the context of the formation of Earth, and constrains the 
pristineness of the dynamical state of the Earth-Moon system. 

The Moon-forming impact is thought to have generated a compact 
circumplanetary disk (within ten Earth radii, Ry) of debris out of which 
the Moon rapidly accreted. Like Saturn's rings, the proto-lunar disk 
would be expected to become equatorial on a timescale that is rapid 
relative to its evolutionary timescale. Hence, so long as the proto-lunar 
material disaggregated into a disk following the giant impact, the Moon 
is expected to have accreted within about one degree of the Earth’s 
equatorial plane®. Tidal evolution calculations suggest that for every 
degree of inclination of the lunar orbital plane relative to the Earth’s 
equatorial plane at an Earth-Moon separation of 10Rg, the current 
lunar orbit would exhibit about half a degree of inclination relative 
to Earth’s orbital plane”. The modern lunar inclination of approxi- 
mately 5° would—without external influences—translate to an inclina- 
tion of about 10° to Earth’s equatorial plane at 10Rg shortly after lunar 
accretion. This approximately tenfold difference between theoretical 
expectations of lunar accretion and the observed Earth-Moon system 
is known as the lunar inclination problem. 

Previous work on this problem has sought to identify mechanisms 
such as a gravitational resonance between the newly formed Moon and 
the Sun” or the remnant proto-lunar disk! that can excite the lunar 
inclination to a level consistent with its current value. Neither of these 
scenarios is satisfactory, however, as the former requires particular 
values of the tidal dissipation parameters and the latter has only been 
shown to be viable in an idealized system in which a single, fully formed 
Moon interacts with a single pair of resonances in the proto-lunar disk. 


Moreover, previous works assumed that the excitation of the lunar orbit 
was determined during interactions that essentially coincided with 
lunar origin. Here, we propose that the lunar inclination arose much 
later as a consequence of the sweep-up of remnant planetesimals in the 
inner Solar System. 

After the giant impact and at most 10° years'*!°, the Moon has 
accreted, interacted with’? and caused the collapse of the remnant proto- 
lunar disk onto the Earth®, passed the evection resonance with the 
Sun*!*!6 and begun a steady outward tidal evolution. On a timescale 
(10°-107 years) that is rapid relative to that characterizing depletion 
of planetesimals in the final post-Moon-formation stage of planetary 
accretion!’ (called ‘late accretior’), the lunar orbit expands, owing to the 
action of tides, to an Earth-Moon separation of 20Rg—-40Rg. During this 
time, the lunar orbit transitions from precession around the spin axis of 
Earth to precession around the normal vector of the heliocentric orbit’, 
and its inclination becomes insensitive to the shifting of the Earth’s 
equatorial plane via subsequent accretion'®. However, as we show, lunar 
inclination becomes more sensitive to gravitational interactions with 
passing planetesimals as the tidal evolution of the system proceeds. The 
sensitivity is such that it renders the lunar orbital excitation a natural 
outcome of the sweep-up of the leftovers of accretion and yields a new 
constraint on the dynamical and tidal environment of the Earth-Moon 
system in the 10° years immediately following its origin. 

Although subsequent collisions with the Earth-Moon system have 
been previously considered as a mechanism for dynamical excitation'’, 
the collision of inner Solar System bodies with the Earth tends to be 
preceded by a large number (10°-10%) of collisionless encounters. 
Excitation via this process is governed by two relevant timescales: the 
timescale over which remnant populations in the inner Solar System 
are lost via accretion onto the planets and the Sun (several tens of mil- 
lions of years!”); and the timescale for the lunar tidal orbital expan- 
sion, which is a rapidly varying function of the Earth-Moon distance. 
The Earth-Moon distance is important because it determines the sys- 
tem cross-section for collisionless encounters with remnant bodies. 
Accordingly, the rate of tidal expansion of the lunar orbit during the 
first approximately 10° years after the giant impact is also important. 
As tidal evolution proceeds and the Earth-Moon separation increases, 
the system becomes increasingly susceptible to collisionless excitation, 
while populations capable of exciting the system are progressively 
depleted. A few tens of millions of years after the Moon-forming event, 
the Earth-Moon system reaches an optimal capacity for excitation via 
gravitational encounters: a dynamically excitable system co-existing 
with a substantial remnant-body population. 

Here, we run a series of Monte Carlo simulations to set constraints 
on the outcome of repeated encounters of massive bodies with the 
evolving Earth-Moon system. The simulations are carried out until 
the populations are exhausted either through collision with the Earth 
or through non-terrestrial loss channels (for details, see Methods). 
A sample run of dynamical excitation during the first approximately 
108 years of Earth-Moon history is shown in Fig. 1. No single event 
dominates: several strong encounters contribute substantially to the 
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Figure 1 | Sample realization. a, b, A model of the early lunar orbit 
subject to tidal evolution (k)/Q=0.1) and encounters leading to collision 
of two 0.00375Mg bodies with the Earth. The semi-major axis of the 
evolving lunar orbit a is given in Earth radii (shown in b). Although not 
every encounter increases the lunar inclination i, the cumulative effect 
shows a tendency towards excitation (shown in a). Notable interactions 
include merging collisions with the Earth kicking the lunar orbit via recoil 
(at 29.1 Myr and 31.5 Myr), several exceptionally close encounters with the 
Moon (at 7.3 Myr and 109.6 Myr) and the exhaustion of the population 
(at 141.7 Myr) ultimately marking the end of the simulation. Subsequent 
inclination damping owing to planetary tides is modest (from 5.8° at 47Rg 
at the end of the simulation to 5.4° at 60R,), a feature that is typical of this 
‘late’ excitation mechanism. 


final excitation. The size distribution of the late-accreting population 
is assumed to be top heavy, with most of the mass contained in a few 
massive bodies, as has been previously proposed to explain terrestrial 
late accretion’®"!. This particular simulation ultimately results in two 
0.00375Mg planetesimals (where Mg is Earth’s mass) left over from the 
formation process colliding with the Earth. Tidal damping of the lunar 
inclination is applied along with the lunar orbital expansion, following 
equation (1) (see Methods). We do not consider the possibility that the 
lunar inclination might have been more strongly damped via dissipa- 
tion in the lunar magma ocean, as recently proposed’’. In the Methods, 
we show that this effect is not important as long as the lunar magma 
ocean crystallized within a few tens of millions of years. 

Lunar orbital excitation in this epoch depends on the total mass 
of leftover planetesimals, the number of bodies carrying the mass 
and their orbital distribution, the rate of terrestrial tidal dissipation, 
and a stochastic element. In Fig. 2, we show results of simulations of 
the excitation of the lunar inclination due to interaction of the sys- 
tem with a small amount of mass (equivalent to 0.0075M;-0.015Mz 
eventually accreted to the Earth), for different values of the strength 
of tidal dissipation and the number of bodies delivering the mass 
(which is constrained to be <20 colliding with Earth via models of 
late accretion!®1!), 

Several features are apparent. First, there is a quasi-linear dependence 
of the excitation on the total mass of late accretion: other variables 
being equal, excitation corresponding to 0.015Mg of late accretion is 
approximately twice as great as that with 0.0075Mg. The mass accreted 
onto the Earth thereby provides a proxy for collisionless excitation. 
Second, the lunar orbital excitation exhibits some dependence on the 
strength of tidal evolution: stronger dissipation within the Earth drives 
the lunar orbit outwards faster and exposes the system to more colli- 
sionless events. Simulations with the weakest tides that we considered— 
characterized by the ratio of the tidal Love number to the specific dissi- 
pation function kz/Q=0.01—typically excite the lunar inclination with 
a planetesimal population consistent with 0.015Mg of late accretion, 
whereas with stronger tidal dissipation (kp/Q=0.1), lunar inclination 
is routinely excited by a planetesimal population carrying 0.0075Mg of 
late accretion. Third, there exists a negative dependence of the excita- 
tion on the number of bodies involved in late accretion, such that the 


LETTER 


25 
1 H 
20+ 1 2 
15} | 3 
c ee : a 
~ 1 ; ‘ 
10; ; : 1 2 
1 9 ; ja 
1 4 ‘Lunar ! 2 
5} | ti : —__ $ 
0 fn eye ‘ a ae 
0.01 7 0.1 


Strength of tides, k,/Q 


Figure 2 | Summary of simulations. Median values (symbols), and lo 
(solid lines) and 2c (dashed lines) intervals for the lunar inclination i at 

the end of the simulations after damping via planetary tides to the modern 
Earth-Moon separation. The excitation in the modern lunar orbit is plotted 
for comparison, indicated by the horizontal line labelled ‘lunar’. Diamonds 
correspond to simulations with 0.0075Mg accreted to Earth; squares 
correspond to 0.015Mg. ‘Strong’ tidal dissipation (k,/Q=0.1) corresponds 
to a hot dissipative silicate Earth, and ‘weak dissipation (k2/Q=0.01) 
represents the geologic average value dominated by dissipation in shallow 
oceans. In these simulations, the accretion of 0.0075Mg (with ‘strong’ tides) 
to 0.015Mg (with ‘weak tides) frequently reproduces the excitation in the 
lunar orbit. The number of bodies delivering the late accreted mass in each 
set of simulations is reported above each symbol. 


mechanism requires a population that is top heavy (with most of the 
mass delivered via the most massive bodies). For a given mass of late 
accretion, a greater number of bodies also renders the distribution of 
lunar inclinations more strongly peaked and predictions of the expected 
excitation more precise. Despite an order of magnitude of uncertainty 
in the strength of early terrestrial tides (k2/Q) and in the number of 
bodies involved in the leftover population, and the stochasticity that is 
inherent in collisionless encounters, close encounters with a popula- 
tion of planetesimals delivering 0.0075Mg-0.015Mg to the Earth after 
the Moon-forming event can robustly reproduce the excitation that 
characterizes the lunar orbit. 

The angular momentum of the Earth-Moon system at the time of 
its origin is a central feature diagnostic of various proposed giant- 
impact scenarios!?*'8, Given that the lunar orbit provides a sensitive 
dynamical measure of encounters with the Earth after the origin of the 
Moon, we ask whether such gravitational interactions were effective in 
injecting or extracting angular momentum. Figure 3 summarizes the 
angular momentum change versus the final excitation. The change in 
angular momentum corresponding to the modern inclination excita- 
tion of approximately 5° is probably a few tens per cent or less. Hence, 
the standard giant-impact scenario‘ followed by little subsequent 
dynamical modification is compatible with the dynamical state of the 
modern system, whereas a high-angular-momentum impact scenario*” 
would require another dynamical mechanism such as the evection res- 
onance*'*'* to be reconciled with the modern Earth-Moon system. 

The sensitivity of the orbits of impact-generated satellites to ongoing 
accretion onto the host planet has several consequences. The degree 
of orbital excitation resulting from interaction with, and accretion of, 
0.0075Mg-0.015Mg onto the early Earth suggests that collisionless 
encounters with massive bodies—such as the Moon-to-Mars mass 
embryos thought to have played a key role in the accretion of the 
Earth—would have excited satellites to very eccentric orbits ulti- 
mately leading to their dynamical loss, either via collision with the 
host planet or liberation into heliocentric orbit. Such excitability of 
impact-generated satellite orbits may explain several features of the 
inner Solar System that have yet to be understood. For example, despite 
impact-generated satellites being a quasi-generic feature of terrestrial 
planet formation via giant impact, the absence of an impact-generated 
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Figure 3 | Angular momentum change of the Earth-Moon system. 
Median values (symbols), and 1c (solid lines) and 2c (dashed lines) 
intervals for inclination i and angular momentum L, change via post-lunar 
collisionless encounters. Diamonds represent realizations with weak tides 
(k,/Q=0.01) and 0.0075Mg accretion; squares correspond to realizations 
with stronger dissipation (k2/Q=0.1) and 0.015Mg accretion, bracketing 
the range in our simulations. Each suite of simulations is composed of two 
subsets: one with late accretion delivered via one body (greater excitation); 
the other, four bodies (lesser excitation). Intermediate outcomes with two 
accreted bodies are omitted for clarity. For a level of excitation consistent 
with the modern lunar orbit (5.15°), the amount of system angular 
momentum change is probably <20%. However, the 2c intervals for the 
strongest excitation case plotted extend to Ai= 42° and |AL,/L,| = 0.48, 
implying a small probability (<5%) for angular momentum change >50%. 


satellite around Venus” and the apparent absence of a pre-Moon ter- 
restrial satellite’! can be understood: any such early-formed satellites 
would have been lost via encounters with extant planetary embryos, 
including perhaps the Moon-forming impactor itself. Moreover, the 
occurrence of the Moon-forming giant impact late in the history of 
Earth accretion can be understood as a necessity for its survival: even 
satellites generated moderately earlier would have been readily dynam- 
ically destabilized. Just as the survival of planets depends on the sur- 
rounding stellar environment”, the survival of an impact-generated 
satellite depends on the planetary environment at the time of origin. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


A large number of simulations (about 10°) are required to characterize the distribu- 
tion of outcomes for repeated encounters of a given planetesimal population with 
a given early Earth-Moon system. Accordingly, we design a numerical experiment 
that captures the physics of the problem statistically and that can be computed 
efficiently. Heliocentric orbits for late-accreting planetesimal populations were 
generated according to a Rayleigh distribution with a Rayleigh eccentricity eg =0.3 
and inclination ig = ep/2; these values are consistent with simulations of terrestrial 
planet formation’. To test for sensitivity to population orbits, we varied ex between 
0.3 and 0.4; the resulting median inclination excitation changed by less than 10%. 
With given orbital distributions, the subset of the population that is Earth-crossing 
was selected and encounter probabilities with the Hill radius (Ry) of the Earth were 
calculated according to expressions given in ref. 24. The masses of the planetesimals 
are assumed to be in the range 0.15M,-1.2M, (My is the lunar mass), consistent 
with those expected for the projectiles carrying the Earth’s late accretion’. At the 
beginning of the simulations, the Earth and the Moon were placed on circular 
uninclined orbits with radii of 1 au and 5Rg, respectively, near their orbits at the 
end of accretion and the beginning of tidal history. An encounter time and encoun- 
ter orbit were chosen randomly according to the distribution of Earth-crossing 
planetesimal encounter probabilities. At the time of each encounter, phases for 
the lunar orbit, characterized by the argument of perigee (w), the longitude of 
the ascending node ({2) and the mean anomaly (M) were selected randomly, as 
was the orientation of the planetesimal orbit, within those orientations admitted 
by the selected orbital parameters. The impact parameter (b) was selected in 
the interval [0, Ry] according to a uniform encounter probability per unit area 
(dP x bdb). Gravitational three-body (Earth-Moon-planetesimal) encounters 
were integrated with a Bulirsch—Stoer integrator (included in the SWIFT package) 
in a geocentric reference frame, tracking changes to the lunar orbit. In between 
three-body encounters, the eccentricity (e) and semi-major axis (a) of the lunar 
orbit were evolved with a constant-Q tidal model”? while the lunar inclination was 
evolved with a model” for planetary tides: 


i 4a (1) 
Impacts with the Earth were assumed to be inelastic merging events with the final 
body carrying the total mass and momentum. Impacts onto the Moon would have 
been in the erosive and/or catastrophic disruption regime and realizations with 
such events were removed from the subsequent analysis (discussed below). 

Remnant planetesimals can, in general, be eliminated via collision with the ter- 
restrial planets and the Sun or dynamical ejection from the inner Solar System". 
We characterize such losses using the outcome of direct N-body simulations that 
trace the evolution of such early planetesimals, yielding a tenfold depletion of the 
Earth-crossing population in the first 100 Myr, which corresponds to a population 
decay law of approximately exp(—t/Tss), where f is time and 7g is the time con- 
stant for the decay of the population (approximately 44 Myr) (ref. 17). Although 
the modern near-Earth-object population is resupplied by the asteroid belt in a 
quasi-steady state fashion and effectively does not decay, the leftover planetesi- 
mal population is not resupplied by a larger reservoir and therefore does decay. 
Owing to partial resupply of Earth-crossing bodies, the timescale for the decay 
of the Earth-crossing population can nevertheless be different to the lifetime of 
individual particles. To integrate the decay rates of Earth-crossing planetesimals 
using our simulations, we use the following procedure. After generating orbital 
populations, but before running three-body integrations, we allow the Earth- 
crossing populations to encounter the Earth alone, which permits derivation of 
a time constant for decay of this population solely via collision with the Earth 
(TE=79 Myr). Next, we require that the Earth-crossing planetesimal population 
in our three-body simulations decay at the same average rate as that observed in 
the N-body heliocentric simulations. We therefore decompose average loss rates 
of the Earth-crossing population into terrestrial and non-terrestrial loss modes 
(1/Tss= 1/Tg+ 1/TNg), and thereby derive a time constant (TNE = 99 Myr) for 
removal via non-terrestrial loss channels. Accordingly, we stochastically remove 
bodies from the population in our three-body simulations such that the average 
loss rate of Earth-crossing planetesimals—through collision with Earth (explicit) 
as well as through other modes of loss (implicit)—is consistent with the aver- 
age loss rates observed in N-body simulations of late accretion!” (see Extended 
Data Fig. 1a). 

Each data point in Fig. 2 is the result of 4,000 realizations. To analyse the results, 
certain realizations were eliminated. These realizations fall into three categories: 
those that result in (1) a collision of a planetesimal with the Moon; (2) dynamical 
loss of the Moon; and (3) too large or too small a mass accreted by the Earth. 

(1) Occasionally, one of the planetesimals in our simulations collides with the 
Moon rather than with the Earth. For simulations that deliver the late-accreted 
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mass to Earth via one, two and four planetesimals, the fraction of realizations 
in which such an event takes place is 9%, 15% and 25%, respectively. Given 
the masses that we assume for the planetesimals (0.15M,-1.2M_), it is doubt- 
ful that the Moon ever experienced such a massive collision. The largest lunar 
impact for which we have clear and unambiguous evidence is the South Pole 
Aitken Basin event (an approximately 10* erg event’), which, at the encounter 
velocities considered here (see Extended Data Fig. 1b), corresponds to a lunar 
impactor that is 3-4 orders of magnitude less massive than the planetesimals 
in our populations. Although most basin-forming impacts are thought to have 
occurred during the late accretion era’, the effect of these impacts on the lunar 
inclination was minor’. 

(2) Certain realizations, particularly those that correspond with the strongest 
tides and largest amount of interacting mass, generate excitations in eccentricity 
that are sufficient to destabilize the satellite orbit. Hence, collision with the host 
planet or (more rarely) liberation of the satellite into heliocentric orbit follows. 

(3) The total amount of planetesimal mass at the start of simulations was 
chosen such that, on average, the mass accreted by the Earth would be 0.0075Mg 
or 0.015Mg, hereafter denoted the ‘target mass. Given the stochasticity inherent 
to this problem, the accreted mass varies between realizations, resulting in a 
distribution of outcomes centred on the target mass. To facilitate the expression 
of the results in terms of late-accreted mass, we eliminate from the subsequent 
analysis those realizations whose accreted mass is greater than or less than the 
target mass. 

Differential momentum transfer is the process underlying this excitation 
mechanism. For simplicity, we describe this process for the case of a planetes- 
imal colliding with the Earth, but the general case of a collisionless encounter is 
similar. The orbit of the Earth and Moon around their common centre of mass 
is defined by their relative position and velocity. A third body encountering the 
Earth-Moon binary must have an orbit that crosses the system’s heliocentric orbit, 
and approaches it with some finite velocity at large separation. Hence, the delivery 
of mass onto the Earth is accompanied by the delivery of external momentum 
that—in the impulse approximation—changes the relative velocity, but not the 
relative position, of the Earth with respect to the Moon, altering the mutual orbit, 
a hitherto overlooked effect that can excite the lunar inclination and eccentricity. 

We ask whether the satellite excitation is dominated by the few strongest 
encounters or by the much more numerous distant ones. Theory suggests that 
for top-heavy perturber populations, the few strongest perturbations dominate 
over the more numerous weak ones”*. We test this theory for our simulations by 
generating realizations where the impact parameter is chosen in the interval 
(0, Ruy], [0, Ru/2] and [0, Ry/4], progressively eliminating a large number of distant 
encounters. The results of the three simulations are statistically indistinguishable 
(see Extended Data Fig. 2a), confirming the theoretical expectation. To facilitate the 
reproducibility of our results, we plot a measure of the strength of a perturbation 
against the impact parameter of the encounters for two individual simulations 
(Extended Data Fig. 2b, c). 

The simulations are permitted to proceed until the population of Earth-crossing 
bodies are exhausted either through collision with the Earth or through loss via 
another channel in accordance with an average rate (see above). At the end of the 
simulations, which characteristically last about 10° years, the lunar orbit is typically 
at about 40Rg. To compare the simulation outcomes to the modern system, we 
propagate the lunar orbit forward to its current separation at 60Rg and permit the 
inclination to decay in accordance with the action of planetary tides (equation (1)). 
The number of simulations was chosen such that the median inclination values 
vary by only several per cent. 

Recently, it has been suggested that the lunar inclination could have been 
damped via obliquity tides in the lunar magma ocean (LMO) as the Moon’s 
obliquity increased during its approach to the Cassini state transition between 
20Rz and 30Rg (ref. 19). The authors of ref. 19 put forward one interpretation of 
the current excited state of the lunar inclination: that the inclination was excited 
early”, but that the rate of tidal dissipation in the post-giant-impact Earth 
was sufficiently low to delay passage of the lunar orbit through the Cassini state 
transition until after the crystallization of the LMO, at which point the effect 
of obliquity tides on the lunar inclination becomes much less. With the ‘late’ 
mechanism of lunar orbital excitation described here, we identify a different 
solution: that the rate of tidal dissipation on the post-impact Earth is sufficiently 
rapid in the first tens of millions of years to carry the Moon through the Cassini 
state transition and to damp any early acquired lunar inclination. Following 
such a resetting episode, the LMO crystallizes, and inclination excitation due 
to encounters is subsequently preserved. To explore this solution, we ran a suite 
of simulations in which the inclination is reset to zero until a certain time, and 
permitted to accumulate excitation subsequently (see Extended Data Fig. 3). 
Such a transition marks the time of crystallization of the LMO. It can be seen 
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that as long the duration of the LMO crystallization is sufficiently short, such 
a solution is viable and, indeed, necessary in a tidal evolution scenario recently 
described”’. 

Code availability. The code used to conduct these simulations is available by 
request from the authors. 
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Extended Data Figure 1 | Properties of planetesimal populations. 

a, Decay rates of Earth-crossing planetesimal populations according 

to N-body simulations of the inner Solar System with a resonant (3:2) 
Jupiter and Saturn at 5.4 au and 7.2 au, respectively. Different colours 
represent the number of Earth-crossing bodies (hence the evolution is 
not monotonic) in different simulations, from recent integrations’”. The 
black line is the prescribed decay rate used in the three-body simulations 
(7 ss = 44 Myr) to match the decay rate in heliocentric simulations. 
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b, Histogram of implemented approach velocities (before acceleration due 
to Earth gravity) for late-accreting populations in three-body simulations. 
The population is generated using a Rayleigh distribution of eccentricities 
and inclinations (eg = 0.3, ig = er/2) and a semi-major axis range 
(a=0.8-1.4 av) that produces Earth-crossing orbits. These parameters are 
motivated by simulations of terrestrial planet formation, but the peak of 
the distribution (9 km s~') corresponds to the typical encounter velocity 
inferred for lunar basin-forming impactors*’. 
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Extended Data Figure 2 | Tests and outcomes for reproducibility. a, Test 
of the cumulative effect of repeated encounters: median values (squares), 
and 1a (solid lines) and 2c (dashed lines) intervals for three suites of 
simulations with 0.0075Mg accreted via a single body onto an Earth with 
strong dissipation (k,/Q=0.1). Each suite of simulations consists of 
incoming planetesimals with impact parameters (b) ranging from 0 to Ry, 
Ry/2 and Ry/4 (as indicated by the values given above each simulation 
result). The statistical similarity of the resultant distributions shows that 
distant encounters have a far smaller effect on inclination excitation than 
do the rare and strong close encounters. b, ¢, Distribution of ‘kicks’ versus 
impact parameter (b) of encounters from two different realizations. 
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The change in the angular momentum vector of the satellite (A) owing to 
encounter torques is normalized to the magnitude of the orbital angular 
momentum before the encounter (h). The planetesimals approaching the 
Earth-Moon system in this simulation have a mass of 0.0075Mg. Approach 
velocities are selected from the distribution plotted in Extended Data 

Fig. 1b. The cumulative effects of encounters with b > 60Rg are negligible 
and therefore neglected. Panel b (c) shows data from a realization that lasts 
26.2 Myr (45.3 Myr) and results in a satellite with a final inclination of 
i=1.9° (i=8.8°). The strength of tidal dissipation used here (k»/Q=0.1) 
quickly results in a satellite semi-major axis of a= 30Rg-40Rg. 
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Extended Data Figure 3 | Effect of partial damping due to LMO 
obliquity tides. Median values (squares), and 1a (solid lines) and 20 
(dashed lines) intervals for several suites of partially damped simulations. 
These simulations consist of an accreted mass of 0.0075Mg delivered 

via a single body onto a strongly dissipative (k;/Q=0.1) Earth, with the 
orbital excitation continuously reset (e=0, i=0) until a certain time and 
permitted to accumulate subsequently. Such a transition represents LMO 
crystallization and the cessation of inclination damping via obliquity 
tides!’. These simulations are representative of excitation behaviour for 
partially damped cases. It can be seen that, so long as the crystallization of 
the LMO is sufficiently rapid (about 10’ years), excitation via planetesimal 
encounters is largely unaffected. 
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Type-II Weyl semimetals 


Alexey A. Soluyanov!, Dominik Gresch!, Zhijun Wang’, QuanSheng Wu!, Matthias Troyer!, Xi Dai? & B. Andrei Bernevig? 


Fermions—elementary particles such as electrons—are classified as 
Dirac, Majorana or Weyl. Majorana and Weyl fermions had not been 
observed experimentally until the recent discovery of condensed 
matter systems such as topological superconductors and semimetals, 
in which they arise as low-energy excitations'-°. Here we propose 
the existence of a previously overlooked type of Weyl fermion that 
emerges at the boundary between electron and hole pockets in a new 
phase of matter. This particle was missed by Weyl’ because it breaks 
the stringent Lorentz symmetry in high-energy physics. Lorentz 
invariance, however, is not present in condensed matter physics, 
and by generalizing the Dirac equation, we find the new type of 
Weyl fermion. In particular, whereas Weyl semimetals—materials 
hosting Weyl fermions—were previously thought to have standard 
Weyl points with a point-like Fermi surface (which we refer to as 
type-I), we discover a type-II Weyl point, which is still a protected 
crossing, but appears at the contact of electron and hole pockets in 
type-II Weyl semimetals. We predict that WTe, is an example of 
a topological semimetal hosting the new particle as a low-energy 
excitation around such a type-II Weyl point. The existence of type-II 
Weyl points in WTe2 means that many of its physical properties are 
very different to those of standard Weyl semimetals with point-like 
Fermi surfaces. 

The band structure of some metals has non-trivial topological fea- 
tures”, Of such metals, the ones with vanishingly small density of states 
at the Fermi level—semimetals—stand out. For these materials, a dis- 
tinction between topologically protected surface states and bulk metal- 
lic states can be made and their Fermi surfaces can be topologically 
characterized, unlike the case for metals, which have many states at the 
Fermi level. Two kinds of topological semimetals have attracted spe- 
cial attention: Dirac and Weyl semimetals. In these materials, a linear 
crossing of two (Weyl) or four (Dirac) bands occurs at the Fermi level 
(see Fig. 1a). The effective Hamiltonian for these crossings is given by 
the Weyl or gapless-Dirac equation, respectively. The Weyl crossings 
are protected from gapping, owing to the massless nature of the Weyl 
fermion. In the following, we limit the discussion to Weyl crossings 
only, although our results also hold for Dirac crossings. 

The appearance of Weyl points (WPs) is possible only if the product 
of parity and time reversal is not a symmetry of the structure. When 
present, a WP acts as a topological charge—either a source or a sink of 
Berry curvature. A Fermi surface enclosing a WP has a well-defined 
Chern number, corresponding to the topological charge of this WP. 
Because the net charge must vanish in the entire Brillouin zone, WPs 
always come in pairs; they are stable to weak perturbations and are 
annihilated only in pairs of opposite charge. A large number of unusual 
physical phenomena are associated with Weyl topological semimet- 
als, including the existence of open Fermi arcs in the surface Fermi 
surface’ and various magnetotransport anomalies”, 

Weyl semimetals with broken time-reversal symmetry have been 
predicted to exist in several materials!!°'’, but these predictions have 
yet to be experimentally verified. More recently, the Weyl semimetal 
was predicted to exist in inversion-breaking single-crystal non- 
magnetic materials of the TaAs class**; this prediction has since been 
verified experimentally*®. 


Weyl semimetals were previously thought to have a point-like 
Fermi surface at the WP. We refer to these as type-I WPs (WP1s), to 
distinguish them from the new type-II WPs (WP2s) that exist at the 
boundaries between electron and hole pockets, as illustrated in Fig. 1b. 
We discuss general conditions for WP2s to appear, and present evi- 
dence that WTe,—the material with the largest never-saturating 
magnetoresistance reported’® so far—is an example of the new type 
of topological semimetal hosting eight WP2s. These WP2s come in 
two quartets located 0.052 eV and 0.058 eV above the Fermi level. 
We present topological arguments that prove the existence of the 
new topological semimetal phase in WTe2. We provide evidence of 
doping-driven topological Lifshitz transitions, which are characteristic 
of WP2s, as well as emerging Fermi arcs in the surface Fermi surface. 

We start by considering the most general Hamiltonian describing 
a WP 


H(k)= > k Ajo; 
i=x,y,Z 
J=0,x,y,z 


where k is the wave vector in reciprocal space (crystal momentum vec- 
tor), A is a3 x 4 matrix of coefficients, a is the 2 x 2 unit matrix and 
and oj, j=x, y, z are the three Pauli matrices. The energy spectrum is 


2 


= T(k)+ U(k) 


kA, 


i=x,y,z 


ex(k)= D7 k Apt, DO | 


i=x,y,Z JHXy Zz 


(1) 


where T(k) and U(k) can be considered as the kinetic and potential 
components of the energy spectrum. T(k), which is linear in momen- 
tum, tilts the cone-like spectrum ¢+(k). This tilt breaks the Lorentz 
invariance of Weyl fermions in quantum field theory, but was previously 
considered unimportant. However, because Lorentz invariance does not 
need to be respected in condensed matter, its inclusion is important and 
leads to a finer classification of distinct Fermi surfaces, in correspond- 
ence with the theory of quadric surfaces, which suggests that there are 
exactly two distinct types of WPs (see Supplementary Information). 

If, for a particular direction in reciprocal space, T is dominant over 
U, the tilt becomes large enough to cause a WP to appear at the point 
where the open electron and hole pockets touch, contrary to the stand- 
ard case of a point-like Fermi surface. Thus, the condition for a WP to 
be of type II is that there exists a direction k for which T(k) > U(k). If 
such a direction does not exist, then the WP is of type I. The clear 
qualitative distinction between the Fermi surfaces of the two types of 
WPs leads to marked differences in the thermodynamics of the hosting 
materials and their response to magnetic fields. In particular, in con- 
trast toa WPI, which exhibits a chiral anomaly? for any direction of 
the magnetic field, the chiral anomaly appears in a WP2 only when the 
direction of the magnetic field is within a cone where |T(k)| >|U(k)|. 
If the field direction is outside this cone, then the Landau-level spec- 
trum is gapped and has no chiral zero mode (see Supplementary 
Information). 

On the lattice, the ‘no-go’ theorem!” guarantees that Weyl fermi- 
ons appear in pairs with Chern numbers of opposite sign. Because the 
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Figure 1 | Possible types of Weyl semimetals. a, Type-I WP with a point- 
like Fermi surface. b, A type-II WP appears as the contact point between 
electron and hole pockets. The grey plane corresponds to the position of 
the Fermi level, and the blue (red) lines mark the boundaries of the hole 
(electron) pockets. 


Chern number of a WP is not changed by T(k), WPs of different type 
can be chiral/anti-chiral partners of each other. The number of WPs of 
a certain type can be odd, but the total number of WPs must be even 
(for example, there can be one WP1 and one WP2). 

We now describe WTeo, a material we identified to host the new 
WPs. The crystal structure of WTe, is orthorhombic with space group 
Pmn2, (Cj,). Its primitive unit cell contains four formula units. The 
atomic structure is layered, with single layers of W separated from each 
other by Te bilayers and stacked along the z axis (see Supplementary 
Information). The distance between adjacent W atoms is considerably 
smaller along the x axis than it is along the y or z axes, creating strong 
anisotropy. The unit cell has two reflection symmetries: a mirror in the 
y-z plane my, and a glide plane g,, formed by a reflection in the x-z 
plane followed by a translation by (0.5, 0, 0.5). Combined, they form a 
non-symmorphic twofold rotation C (that is, a twofold rotation that 
is combined with a translation by a fraction ofa lattice constant), which 
is important in the following symmetry arguments. 

The result of band-structure calculations (see Supplementary 
Information) without spin-orbit coupling (SOC) is shown in Fig. 2a 
along the I'-X direction, where an intermediate point © = (0.375, 0, 
0) is introduced. In addition to electron and hole pockets, 16 WPs per 
spin are found in WTe; in the absence of SOC (not shown in Fig. 2a). 
Half of these points occur at points of low symmetry with k,# 0; the 
other half appear in the k,=0 plane, where the product of time rever- 
sal and C2 (Cyr= CT) forms a little group. Generically, degeneracies 
on high-symmetry planes are forbidden; however, owing to the Cyr 
symmetry, twofold degeneracies are locally stable at points in the k,=0 
plane. On the I-X line, the spectrum is generally gapped with a band- 
gap of approximately 1 meV, separating valence and conduction bands; 
see Fig. 2a. 

Accounting for spin, but without SOC, bands become doubly degen- 
erate, owing to opposite spin projections. This degeneracy doubles the 
topological charge of each WP because, by SU(2) symmetry, WPs 
corresponding to opposite spins have identical topological charge. 
Infinitesimal SOC cannot gap these WPs, giving a general criterion 
by which to search for Weyl semimetals: WPs are first found without 
SOC on the high-symmetry planes; the effects of SOC on these WPs 
are studied separately. 

In WTe2 SOC is not small. When turned on, it preserves electron 
and hole pockets, but substantially changes the structure of WPs. At 
intermediate SOC, WPs move, emerging or annihilating in pairs of 
opposite chirality. At full SOC, all WPs with k,# 0 are annihilated. 
In the k,=0 plane, double degeneracies at isolated k points are still 
allowed by symmetry. Eight such gapless points are found, formed by 
the topmost valence and lowest conduction bands at full SOC. A pair of 
such points is shown in Fig. 2c. The other three pairs are related to this 
one by reflections. Energetically, both points are located only slightly 
(0.052 eV and 0.058 eV) above the Fermi energy Ep; see Supplementary 
Information for details. 
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Figure 2 | Band structure of WTep. a, Band structure of WTe2 without 
SOC. A fraction of the [-X segment is shown: the point = has coordinates 
(0.375, 0, 0). A bandgap of approximately 1 meV is shown in the inset, 
signalling a gapless point nearby. b, Band structure of WTe2 with SOC. 

c, One of the four pairs of WPs is shown along the line K-K’, where 

K= (0.1208, 0.0562, 0) and K’ = (0.1226, 0.0238, 0). Their locations are 
designated in reduced coordinates (in units of reciprocal lattice constants). 


Establishing degeneracies of bands (and the existence of WPs) com- 
putationally (or by inspection) is prone to finite-size effects: a point 
thought to be a degeneracy point might turn out to have a minuscule 
gap upon increasing computational precision. To rigorously establish 
the presence of WPs, we performed many tests that involve computing 
topological indices. The topological charge (+1) of each WP was found 
using an extension of the Wilson-loop and hybrid-Wannier-centres 
methods”! to type-II Weyl semimetals. Z,, topological indices were 
also computed on several planes (including those in both standard and 
non-standard geometries) in the Brillouin zone. In total, these tests not 
only proved the existence of WPs, but also elucidated the structure of 
the Berry-flux connection between WPs and of the Fermi arcs on the 
surface of WTe). The resultant Fermi-arc structure is consistent with 
the calculations presented below. A detailed description of topological 
indices and ways to obtain them are found in Supplementary 
Information. 

To check the nature of the WPs, we obtained the energy spec- 
trum around them from first-principles calculations and fitted it to 
the theoretical model derived by symmetry analysis (Supplementary 
Information). Considering only linear terms in kj—the momentum rel- 
ative to the position of the WP—the spectrum in equation (1) becomes 


e,(k) =Ak, + Bk, fe*k? + (ak, +ck,) + (bk, + dky)? 


The values of the parameters A, B, a, b, c, d and e are given in the 
Supplementary Information. The kinetic component of the energy 
dominates along the line connecting this WP to its nearest neighbour 
(see Fig. 2c and Supplementary Information). We thus conclude that 
WTez is a type-II Weyl semimetal. 

We now discuss the Fermi surface topology and possible topolog- 
ical Lifshitz transitions in WTe>. The evolution of the Fermi surface 
obtained from first-principles calculations is shown in Fig. 3 for differ- 
ent values of Ep. Owing to reflection symmetries, only part of the k,=0 
plane of the Fermi surface is shown. For Ep =0 eV, the Fermi surface is 
formed of two pairs of electron pockets and two pairs of hole pockets 
(eight pockets in total), which are separated in momentum space. For 
each pair, the larger pocket completely encloses the smaller one, in 
agreement with experiments”. This property is illustrated in Fig. 3a, 
where four halved pockets (two electron and two hole) are shown. The 
other halves are obtained by the glide reflection g,., and the remaining 
four pockets with k, > 0 are obtained by the mirror reflection my. All 
Fermi surfaces have zero Chern numbers when Ep =0. 

When E; is raised, two additional electron pockets appear; the 
previously existing electron pockets persist. The hole pockets shrink 
quickly, two disappearing completely. Each of the remaining two split 
into two disconnected pockets. As a result, there are six electron pock- 
ets and four hole pockets in total (see Fig. 3b). When the Fermi level 
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Figure 3 | Fermi surface at k,=0. A part of the Brillouin zone is shown. 

a, Ep = 0 eV; electron pockets (blue and green solid lines) and hole pockets 
(red and magenta dashed lines) come in pairs. WP with Chern number +1 
(—1) is shown in red (blue). b, The representative structure of electron and 
hole pockets at higher energies (Er = 0.055 eV shown). There are four hole 
pockets (one shown; dashed magenta line) and six electron pockets (halves 
of three of them shown; blue and green solid lines). The boxed region is 
the region shown in c-e for different values of Er. c, Ep = 0.052 eV is set to 
the lower-energy WP. Contact between electron and hole pockets occurs 
at this WP. d, Er=0.055 eV is set to be between the two WPs. The electron 
and hole pockets are disconnected. The hole pocket encloses a WP with a 
Chern number C= +1. The electron pocket encloses the WP with C= —1 
and its mirror image (not shown); the net Chern number of this pocket is 
zero. e, When E; = 0.058 eV is set to the higher-energy WP, electron and 
hole pockets touch again (shown). They reopen at larger Ep with zero 
Chern numbers. 


is tuned to the first WP, Ep =0.052 eV (corresponding to the addition 
of approximately 0.064 electrons per unit cell), each of the two newly 
appeared electron pockets touches two hole pockets at the positions of 
the WPs, as illustrated in Fig. 3c for part of k,=0 plane. Further increase 
of Ep disconnects the electron and hole pockets again—see Fig. 3d 
for Er = 0.055 eV—but with changed topology: electron pockets still 
have zero Chern numbers because they enclose two WPs of opposite 
charge, related by g,.. The hole pockets have Chern numbers of +1. 
Topologies of the other hole pockets are obtained by changing the sign 
of the Chern number according to the appropriate mirror and glide 
symmetries. The pockets touch again (see Fig. 3e) when the Fermi 
level is tuned to the higher-energy WP, Ep =0.058 eV (corresponding 
to approximately 0.079 additional electrons per unit cell). Upon raising 
Ey further, the pockets disconnect again, and all Fermi-surface Chern 
numbers become zero. 

To facilitate the observation of topological Lifshitz transitions, hydro- 
static pressure is applied. Neighbouring WPs are pushed away from 
each other in k space under compression. In particular, a 0.5% (2%) 
compression increases the distance between the WPs from 0.7% to 2% 
(4%) of the reciprocal vector |G2| (see Supplementary Information for 
a discussion of strain effects, including how to obtain only four WPs). 

Finally, we discuss the topological surface states of WTe2. Owing to 
reflection symmetries, WPs of opposite chirality are projected on top of 
each other on the (100) and (010) surfaces, which hence do not exhibit 
topologically protected surface states. 

For the (001) surface, all the WPs project onto distinct points; hence, 
topological surface states appear. When Ey is tuned to be between the 
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Figure 4 | Topological surface states. a, Spectral function of the (001) 
surface. The Fermi level (green line) is set to be between the WPs. b, Fermi 
surface of the (001) surface and a Fermi arc connecting hole and electron 
pockets. Green crosses mark the positions of WPs. 


WPs, the hole pocket has non-zero Chern number and a Fermi arc 
emerges from it, connecting it to the WP of opposite Chern number 
inside the electron pocket. Figure 4a illustrates the spectral function 
of the (001) surface, where surface states connecting electron and hole 
bands are clearly visible. The Fermi surface of this surface has a top- 
ological Fermi arc (Fig. 4b) connecting projections of the topological 
hole (Fig. 3c—e) and electron pockets. The other surface state crossing 
the hole pocket emerges from the electron pocket (not seen in Fig. 4) 
and goes back into it, and thus can be pushed into the continuum of 
bulk states (see Supplementary Information). 

Of other transition metal dichalcogenides, another strong candidate 
material is MoTe> (ref. 23), which is reported to be a semimetal resem- 
bling pressurized WTe>. This material can also be used to explore new 
physical phenomena arising in the new topological semimetal phase 
presented here. 
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Ultrafast ultrasound localization microscopy for 
deep super-resolution vascular imaging 


Claudia Errico!’, Juliette Pierre!?*, Sophie Pezet*°, Yann Desailly!?°, Zsolt Lenkei**, Olivier Coutureb?3* & 


Mickael Tanter!23* 


Non-invasive imaging deep into organs at microscopic scales 
remains an open quest in biomedical imaging. Although optical 
microscopy is still limited to surface imaging owing to optical wave 
diffusion and fast decorrelation in tissue, revolutionary approaches 
such as fluorescence photo-activated localization microscopy 
led to a striking increase in resolution by more than an order of 
magnitude in the last decade!. In contrast with optics, ultrasonic 
waves propagate deep into organs without losing their coherence 
and are much less affected by in vivo decorrelation processes. 
However, their resolution is impeded by the fundamental limits 
of diffraction, which impose a long-standing trade-off between 
resolution and penetration. This limits clinical and preclinical 
ultrasound imaging to a sub-millimetre scale. Here we demonstrate 
in vivo that ultrasound imaging at ultrafast frame rates (more than 
500 frames per second) provides an analogue to optical localization 
microscopy by capturing the transient signal decorrelation of 
contrast agents—inert gas microbubbles. Ultrafast ultrasound 
localization microscopy allowed both non-invasive sub-wavelength 
structural imaging and haemodynamic quantification of rodent 
cerebral microvessels (less than ten micrometres in diameter) 
more than ten millimetres below the tissue surface, leading to 
transcranial whole-brain imaging within short acquisition times 
(tens of seconds). After intravenous injection, single echoes from 
individual microbubbles were detected through ultrafast imaging. 
Their localization, not limited by diffraction, was accumulated over 
75,000 images, yielding 1,000,000 events per coronal plane and 
statistically independent pixels of ten micrometres in size. Precise 
temporal tracking of microbubble positions allowed us to extract 
accurately in-plane velocities of the blood flow with a large dynamic 
range (from one millimetre per second to several centimetres 
per second). These results pave the way for deep non-invasive 
microscopy in animals and humans using ultrasound. We anticipate 
that ultrafast ultrasound localization microscopy may become an 
invaluable tool for the fundamental understanding and diagnostics 
of various disease processes that modify the microvascular blood 
flow, such as cancer, stroke and arteriosclerosis. 

The recent discovery of super-resolution optical microscopy led to 
a revolutionary improvement of resolution through the use of differ- 
ent technical approaches!”. One major implementation, fluorescence 
photo-activated localization microscopy (FPALM), exploits the 
stochastic blinking of specific fluorescent sources to separate them 
into individual events in independent frames. A super-resolved image 
is obtained by localizing the centre of each separable source and accu- 
mulating these positions over thousands of acquisitions. The resulting 
image highlights structures that are hundreds of times smaller than the 
wavelength, such as the cell membrane and small organelles’. 

In clinical ultrasound imaging, intravenously injected contrast 
agents (1-3-j1m-diameter microbubbles) act as intravascular acoustic 


sources to reveal the vascular bed. At typical concentrations, a cloud 
of microbubbles can be considered as a sub-wavelength random dis- 
tribution of Rayleigh scatters. The resolution of ultrasound contrast 
imaging is limited by the classical wave diffraction theory and cor- 
responds roughly to the ultrasonic wavelength (typically between 
200m and 1 mm in clinical applications). Nevertheless, thanks to 
the advent of ultrafast ultrasound imaging’, we recently proposed an 
ultrasound equivalent of FPALM®* that surpassed the conventional 
diffraction limit of echography by more than tenfold. The use of ultra- 
fast acquisitions based on plane wave transmissions at the rate of a 
thousand frames per second may lead to several key advantages when 
imaging contrast agents. First, the decorrelation of the microbubble 
signal from frame to frame is typically in the millisecond range’. 
As the tissue signature decorrelates more slowly than the microbubble 
signal, it is thus removed by simply applying a differential subtraction 
filter of consecutive frames. Second, since they respond to ultrasound 
differently over several frames, microbubbles blink separately through 
the spatiotemporal differentiation process and become temporally sep- 
arable sources. Last, since the ultrasonic sequence provides simulta- 
neously very high temporal resolution in all pixels of the image, it 
becomes possible to track the signature of many individual micro- 
bubbles both in space and time and thus to quantify the local blood 
flow speed over a very large dynamical range. As ultrasonic waves 
can penetrate several centimetres of tissue, extracting the positions of 
each of these bubbles could lead to the full reconstruction of the deep 
vascular system down to the level of capillaries. However, the useful- 
ness of these theoretical benefits remains to be demonstrated in vivo. 

Current methods for in vivo microvascular imaging are limited by 
trade-offs between the depth of penetration, resolution and acqui- 
sition time. For instance, microcomputed tomography® and mag- 
netic resonance imaging” are able to resolve vessels down to a few 
tens of micrometres with deep tissue penetration, but they remain 
limited by long scanning times. Near-infrared II fluorescence 
imaging’® has high spatial resolution (~50 1m) and fast acquisition 
times (<200 ms). Nevertheless, it lacks sufficient tissue penetration 
(<1-3 mm) for whole-brain imaging. High-resolution photoacoustic 
imaging!! does not require contrast agents and can attain resolutions 
of a few micrometres, but also lacks penetration (0.75 mm). Finally, 
acoustic angiography resolves tumour vessels around 150,.m in 
diameter, but is still hampered by the trade-off between penetration 
and resolution”. 

Here, we demonstrate ultrafast ultrasound localization microscopy 
(uULM), which combines deep penetration and super-resolution 
imaging at unprecedented spatiotemporal resolution, by using clin- 
ically approved contrast agents: inert gas microbubbles. uULM is 
implemented in vivo on anaesthetized male Sprague-Dawley rats fixed 
within a stereotactic frame. Their skull was either left intact or thinned 
to reduce the acoustic attenuation caused by the bone. We used a small 
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ultrasonic probe, connected to a fully programmable ultrafast ultra- 
sound scanner to image a coronal slice of the brain. 

The major challenge of ULM is to intercept a sufficient number 
of separable sources (microbubbles) in the blood stream to obtain 
super-resolved vasculature maps over a large region within a reason- 
able acquisition time. Therefore, we detected microbubbles in the 
rat brain cortex by looking at their fast decorrelation within a stack 
of 75,000 images acquired continuously for 150s. The millisecond- 
timescale decorrelation of the microbubble signal can be generated 
by several processes, including disruption, dissolution and motion. In 
the current implementation, pulse sequences were chosen to reduce 
ultrasound-induced disruption or dissolution of microbubbles. As 
microbubbles are point-scatters and since small variations of phase 
can be detected in the radio-frequency data, microbubble displace- 
ment much smaller than the wavelength appears as a strong decorre- 
lation signal on differential filtered images. Moreover, by exploiting 
the coherence of backscattered signals, the spatiotemporal filtering 
approach discriminates slowly moving objects of sub-wavelength 
size (low spatial coherence), that is, bubbles, from slow motion tissue 
signals whose temporal variations affect many neighbouring pixels 
the same way (high spatial coherence). The ultrafast frame rate was 
achieved by emitting plane waves and collecting the backscattered 
echoes with all the array elements. For each transmission, the resulting 
echoes were exploited to reconstruct in silico an entire ultrasonic frame 
by using parallel beamforming. In the averaged stack of ultrasound 
images only the thinned skull was observable (Fig. 1a). The decorre- 
lation of bubbles was detected using frame-to-frame differential pro- 
cessing, which yields individual and fast-changing sources within the 
ultrafast ultrasound images (Fig. 1b). This high-pass filter uses the very 
high spatiotemporal sampling to eliminate tissue and skull signals. 
Since microbubbles are much smaller than the wavelength (1-3 »m 
versus 100|1m) and can be individually separated in space and time, 
they appeared as the point-spread function (PSF) of the ultrasound 
system. The spatial coordinates of the bubble centroids were extracted 
one by one by deconvolving the individual sources from the predicted 
Gaussian PSF. As these sources are locally unique, each of these posi- 
tions can be estimated with a 2.5 |1m maximum theoretical resolution 
in the axial direction. For example, a blinking microbubble flowing in 
vessels at the level of the primary somatosensory forelimb or hindlimb 
cortex (S1HL/FL), appeared as a spot representing the centre of the 
interpolated PSF (Fig. Ic). 

Typically, we localized in 150s about 1,000,000 events within one 
hemisphere of the brain cortex. Furthermore, we were able to track 
each moving bubble according to its instantaneous position and 
in-plane velocity vector, leading to quantitative and localized maps 
of cerebral blood flow velocity. Hence, ultrafast imaging allows the 
reconstruction of entire organs within tens of seconds, a prerequi- 
site for a preclinical and clinical modality. Far beyond a technological 
leap, ultrafast imaging ensures the necessary discrimination between 
single bubble signatures and tissue at high bubble concentrations 
using optimal spatiotemporal clutter filters!>. By tracking the local 
motion of bubbles at a kHz rate, it estimates their motion over a very 
large dynamic range of velocities and consequently vessel diameters 
(1mm s_! to several cms! and 151m to 1-5 mm, respectively) 
during a sufficiently long acquisition time simultaneously in all vox- 
els of the image. Finally, in fast-moving or pulsatile organs, tissue 
motion correction could be assessed through speckle tracking with 
micrometric sensitivity to co-register bubble positions in real time or 
post-processing®!*!°, This remains a fundamental asset with respect 
to individual bubble localization techniques based on conventional 
ultrasound sequences, recently!® discussed, which need to separate 
echoes through high dilution of contrast agents and image clamped 
tissue for extended durations (1h) because of limited frame rates!’. 

We obtained extremely detailed structural reconstructions of the 
microvasculature in the rat brain cortex (5mm width and 3mm 
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Figure 1 | Principle of uULM. a, Ultrafast detection of individual sources 
from a low-quality B-mode image (averaged stack of 250 beamformed 
images), through a thinned skull. b, Four representative frames were 
separated by 44 ms (t,-t,) and filtered to remove the slow-moving 

tissue signal. c, Three independent microbubbles blinking over several 
milliseconds from b were followed in the region of interest within 

the cortex. The echo of each bubble event (high-contrast pixels) was 
deconvolved with the PSF to obtain the exact position of the centroid 
(red crosses). Superposition of thousands of occurrences yields a highly 
resolved localization map for this region. 


depth) under the thinned skull window (Fig. 2a), displaying vessels 
with diameters between 151m and 65\1m. The images were recon- 
structed with a pixel size of 101m x 8m, corresponding to a tenfold 
increase in resolution as compared to conventional ultrasound imag- 
ing. Furthermore, bifurcations of the penetrating arterioles within 
the S1HL/FL were easily observable down to the terminal branching 
points (Fig. 2a), where vessels attain the hypovascular white matter!®, 
In comparison, the contrast-enhanced image created using conven- 
tional power Doppler is limited by diffraction (Fig. 2b)”, highlighting 
only the large vessels of the rat brain cortex without distinguishing 
details below the wavelength scale. Moreover, Doppler detection is 
strongly biased towards flows that are perpendicular to the array. 

More detailed analysis of the cross-section of individual vessel 
profiles, indicated by lines 1 and 2 in Fig. 2a, yielded diameter sizes 
of 171m and 9m full-width at half-maximum, respectively, corre- 
sponding to capillaries”° (Fig. 2c). These values represent a convo- 
lution between the actual size of the vessel and the response of the 
localization microscopy method, giving an upper limit to its resolution 
(wavelength 4/10). Investigation of a branching vessel profile (profile 3 
in Fig. 2a) showed that at a distance of 161m (X/6), the two vessels are 
still clearly separated. Such high resolution depends on the number of 
bubbles present in the reconstructed pixel (10,1m x 8 1m) and could 
thus be further improved with longer integration times. 

Next, we evaluated the ability of our method to measure blood flow 
dynamics in cortical microvessels. Measured blood flow in-plane 
velocities in the rat brain showed a large dynamic range up to several 
cms! for large vessels and down to 2mm s! in small vessels. Blood 
flow velocity inside of the relatively large penetrating artery was well 
resolved (profile 4 in Fig. 2d, e, profile 5 in Fig. 2d, f) and was inversely 
correlated with vessel diameter, showing 15 mm s~' maximum velocity 
at 80,.m diameter and 2mm s-! maximum velocity at 151m diameter, 
consistent with the literature values*”*. Interestingly, it was clear that 
larger vessels support higher flow within their centre with respect 
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Figure 2 | Spatial resolution and quantification of uULM in the 

rat brain cortex through a thinned skull window. a, Microbubble 
density maps were reconstructed with a spatial resolution of /10 (pixel 
size = 8j1m Xx 10j1m). b, Same area in a conventional power Doppler 
image. c, Interpolated profiles along the lines marked in a display 91m 
vessels (2) and resolve two vessels closer than 161m (3). a.u., arbitrary 
units. d, Dynamic tracking of bubbles separates vessels in two populations 
with opposite blood flow direction. Positive values indicate blood flow 
distancing from the probe. Bubble velocities between 1mm s~! and 14mm 
s ! are detectable. e, f, Velocity profiles associated with lines 4 (e) and 

5 (f) in d. Red line, median; blue box, 25th to 75th percentile; whiskers 
extend to the most extreme data points that are not considered outliers; 
other points, outliers. Unpaired Student's t-test. *P < 0.05, **P<0.01, 
***P <0.001, ****P <0.0001. 
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Figure 3 | ULM of the rat brain through a thinned skull window or 
through the intact skull. a, WULM performed through a thinned skull at a 
coronal section, Bregma —1.5 mm, providing a resolution of 10,1m x 8j.m 
in depth and lateral direction, respectively. c, uULM performed through 
the intact skull at Bregma —1 mm. Owing to the attenuation of the 
ultrasound waves in the presence of the bone, the achieved resolution 

was 12.5,1m x 11m in depth and lateral direction, respectively. Thus, the 
smallest vessel detectable was 201m wide. b, d, In-plane velocity maps 
from parts of the vessels in a and c, respectively. 


to their periphery. Although the images are integrated over a slab of 
about 100m thick, we could separate two sets of vessels simply on 
the basis of their flow velocities. Some bubbles were travelling at a 
much slower speed in the opposite direction than the background 
venules. Moreover, in contrast with conventional ultrasound Doppler 
imaging, which is sensitive mostly to flow towards or away from the 
ultrasound probe, here we also observed and measured microbubbles 
that were moving sideways. This is particularly useful to observe the 
tortuosity of the small vessels and detect abrupt branching in vessels 
within the cortex. 

In-plane velocity measurements can define the resolution of ULM. 
We consider that two resolution cells are distinguishable if their veloc- 
ity distributions are statistically different (P < 0.05). The median of 
the upper half of the velocity distribution for each resolution cell is 
displayed in Fig. 2e, f. When the resolution cells are 8-12 1m in size, 
adjacent pixels can be considered distinct. Interestingly, the maximum 
velocities follow a parabolic profile, as expected for vessels of this size. 

Finally, we investigated the spatial coverage of our imaging method. 
At 15 MHz, the attenuation of an ultrasound wave within brain tis- 
sue is approximately 5dB cm! (ref. 23), which allows imaging at 
several centimetres depth. Super-resolved images could be obtained 
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in vivo over the entire depth of the brain (12.5 mm at Bregma — 1.5mm; 
Fig. 3a), demonstrating that ULM can map vessels below the rat 
brain cortex over several coronal planes (Extended Data Fig. 2). 
Super-resolved imaging is also possible through the intact skull 
(Fig. 3c; Bregma —1.0 mm) but the lower signal-to-noise ratio, result- 
ing from skull-induced signal attenuation, globally reduces the num- 
ber of localized microbubbles, increasing the limit of the smallest 
detectable vessel. However, this non-invasive version of our imag- 
ing method can still detect vessels that are 201m wide and distin- 
guish vessels that are 20|1m apart deep into the brain (>8 mm). In 
the future, the resolution could be further improved by localizing the 
microbubbles directly from radio-frequency data, which could also 
allow the correction of aberrations from the skull**”°. 

In conventional clinical ultrasound imaging applications, resolu- 
tion is inherently correlated to the ultrasonic frequency and, con- 
sequently, is inversely correlated to penetration depth. However, in 
uULM, resolution is related to the signal-to-noise ratio, the bandwidth 
of backscattered echoes and the number of array elements used in 
the beamforming process. This indicates that very high resolution 
could be reached, even deep into organs, in clinical applications. As 
microbubbles are clinically approved contrast agents and our acoustic 
parameters are well within the US Food and Drug Administration 
guidelines, such clinical applications could be rapidly implemented 
with conventional transducers. For these reasons, it is conceivable that 
dynamic images of the human brain vasculature could be achieved 
with lower frequency ultrasound (around 1 MHz) that can penetrate 
the skull. Ultrafast ultrasound localization could also be applied to 
other deep-seated organs such as liver, kidney or breast, currently 
imaged with ultrasound by implementing appropriate motion- 
correction algorithms. Such algorithms can be performed through 
image registration based on the cross-correlation of the radio- 
frequency signal acquired at high frame rates, which can detect motion 
at the micrometric scale*!*!°. The microbubble events necessary for 
uULM can then be motion compensated thanks to this co-registered 
image. Consequently, this technique will probably have an important 
impact on the study and diagnostics of normal biological processes or 
diseases such as tumour-related angiogenesis. 

We demonstrate super-resolution images of rat brain microvessels 
with pixel sizes comparable to the size of red blood cells, indicating 
that vessels ten times smaller than the ultrasonic wavelength can 
be mapped. Since ultrafast localization imaging can be performed 
through the skull, non-invasive longitudinal studies may be envisioned 
in the future over single or multiple planes within very reasonable 
acquisition times in preclinical or clinical studies. ULM, by removing 
the diffraction-induced trade-off between resolution and penetration 
of ultrasound waves, emerges as the first in vivo technique for imaging 
and quantifying blood flow at microscopic resolution deep into living 
organs. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Theoretical resolution limit. The given theoretical resolution limit corresponds 
to the position error of the localization process®. This PSF deconvolution for 
single isolated spots is inherently limited by the number of channels used in 
receive processing and the timing resolution of the acquisition system. The latter 
is limited mainly by the sampling frequency of the echoes before beamforming. 
An approximate value for the theoretical resolution limit in the axial dimension 
can be obtained by propagating the sampling error in a time-of-flight model, 
which yields: 


097 co, /(2n'!?) 


where 0 is the localization error in the axial dimension, c is the sound speed, 
a, is the timing resolution of the system and n is the number of channels used in 
receive processing. Note that the lower limit of the timing resolution is linked to 
the Cramer-Rao lower bound (CRLB), which describes the minimum obtainable 
estimation error variance when using an unbiased estimator. The derivation of the 
CRLB was given by Walker and Trahey for ultrasound”®. For tasks related to ultra- 
sonic displacement estimation. The standard deviation o- of arrival time estimates 
compared to the theoretical one is described by the relation: 
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where fo is the transmit pulse centre frequency, B is the pulse bandwidth, T is the 
kernel size for the time delay estimation, p is the normalized correlation between 
signals (that is, the correlation between the experimental signal and the reference 
signal used for the PSF decorrelation), and SNR is the signal-to-noise ratio of 
receive signals. 

For the lateral resolution, the size of the aperture must also be taken into account 
as in any classical imaging modality: 
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where x0 is the localization error in the lateral dimension, fis the focal length 
and D is the length of the transducer array (which is the imaging aperture here). 
Following these theoretical models, it is predicted that the 15 MHz array used 
in this study could attain a maximum resolution (full-width at half-maximum) 
of 2.5,.m in the axial direction and 5\1m in the lateral direction at 1 cm depth. 
In humans, lower frequencies are exploited to attain 10cm penetration. With the 
same theoretical model, we can predict a 6 |1m isotropic resolution with a current 
transducer matrix (32 x 32 elements, 300.m spatial pitch, 2.5 MHz frequency, 
70% frequency bandwidth, p> 0.9, 12dB SNR at 5cm depth). 

Animals. All experiments were performed in agreement with the European 
Community Council Directive of 22 September 2010 (010/63/UE) and the local 
ethics committee (Comité déthique en matiére dexpérimentation animale no. 59, 
C2EA-59, ‘Paris Centre et Sud’). Accordingly, the number of animals in our study 
was kept to the necessary minimum. Experiments were performed on n= 3 male 
Sprague-Dawley rats (Janvier Labs), weighing 200-225 g at the beginning of the 
experiments. Animals arrived in the laboratory 1 week before the beginning of the 
experiment, and were housed three per cage. They were kept at a constant temper- 
ature of 22°C, with a 12h alternating light/dark cycle (light 7 a.m. to 7 p.m.). Food 
and water were available ad libitum. 

Preparation of the thinned-skull imaging windows. The skull of the rats was 
thinned to 75-100 |.m over an area of approximately 0.6 cm x 0.9cm. The thinned 
window suits the dimension of the ultrasound linear array (0.08 mm per element; 
128 elements = 10.24mm width). The surgical procedure was performed 1-2 days 
before imaging under anaesthesia using intraperitoneal injections of medetomi- 
dine (Domitor; 0.3 mg kg~!) and ketamine (Imalgéne; 40 mg kg '). The head 
of the animal was placed in a stereotaxic frame and the skull bone was drilled 
(Foredom) at low speed with a micro drill steel burr (Fine Science Tools, catalogue 
no. 19007-07). To prevent swelling, or oedema of the cerebral cortex, the skull 
was frequently cooled with saline and an airstream during the thinning proce- 
dure as described previously”’. The thinned window was protected by a small 
(1cm x 1 cm) plastic cover, and the skin was sutured using 5.0 non-absorbable 
Ethicon thread. Preliminary experiments showed that this method enabled good 
quality ultrasound imaging results within 24h to 3 days after the preparation, as 
the bone tends to re-grow. 

Preparation of ultrasound contrast-agent microbubbles. To reconstruct the 
vascular microstructure of the rat brain, 1-5 1m perfluorocarbon-filled micro- 
bubbles (Bracco) were dissolved with 0.9% NaCl to yield an initial concentration of 
2 x 108 microbubbles per ml. This concentration corresponds to approximatively 
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500,000 bubbles per ml of blood per injection, which corresponds to the maximal 
dose injected in clinical practice for superficial contrast-enhanced ultrasound”*. 
Ultrafast ultrasound localization microscopy was performed in the brain by 
injecting a maximum of 18 bolus injections (corresponding to 2.7 ml of the initial 
suspension) through the catheterized jugular vein. The coronal ultrafast acqui- 
sitions of the brain were performed every 15 min to guarantee that the injected 
boluses had been cleared out. 

Ultrafast ultrasound imaging sequence. Ultrasound imaging was performed 
using ultrafast Doppler imaging based on compounded plane-wave ultrasound 
transmissions”**°. The hardware of the ultrasound scanner was not modified. 
Ultrafast sequences were initiated and processed through software-based sequence 
encoding and data were imported through a PCI-Xpress fast bus for GPU-based 
post-treatment. 

Owing to its high spatiotemporal resolution (1 ms, 100 1m) (ref. 31), this tech- 
nique can measure small haemodynamic changes related to the neurovascular 
coupling. Real-time B-mode imaging was used to control the placement of the 
probe on the field of view. In detail, we developed a plane-wave compounded ultra- 
fast imaging sequence (three tilted plane waves, —3°, 0° and 3°, pulse-repetition 
frequency PRF = 1,500 Hz) to perform a scan of the entire brain and have a detailed 
overview of its microvasculature over different coronal imaging planes at a high 
frame rate (500 Hz). Our ultrasonic probe is a custom-built array with 160 elements 
and a central frequency of 20.3 MHz (pitch = 0.08 mm, elevation focus = 10 mm). 
Its 15.4 MHz bandwidth allowed the use of this probe at a frequency of 15 MHz. 
The signal from the 16 elements on either side is discarded as it is mounted on a 
fully programmable ultrasound clinical scanner with 256 channels in transmission 
and 128 parallel channels in reception. Data are transferred using a 16x, 6Gbs"! 
PCI express bus and processed using a 12-core 3 GHz Xeon processor, NVidia 
Quadro K5000 Graphical Processing Unit with a bus at 173 Gb s~', providing 
2.1 teraflops. Such software-based architecture enables programming of custom 
transmit/receive sequences where the frame rate of each acquisition can reach 
more than 20 KHz. The linear array was coronally fixed at the anterior—posterior 
coordinates of Bregma —0.5mm and coronally translated for 500j1m with a motor 
to scan and retrieve the vasculature of the whole brain along 2. cm. Each pressure 
transmit pulse consists of 6 cycles (21s duration at 15 MHz) at a 1.5 MPa peak 
rarefaction acoustic pressure (mechanical index = 0.4). These pressure amplitudes 
are chosen to reduce the ultrasound-induced disruption of microbubbles and to 
allow the tracking of these agents over several images. 

Boluses of 1501] microbubbles were injected at the beginning of each ultrafast 

acquisition. Once the scan was completed, we fixed the probe above the Bregma 
—1.0mm to continuously insonify for 150s the rat cortex (3.5 mm depth). Ten 
minutes of acquisition were required per each coronal plane of the whole-brain 
scan (11.6mm depth). In this latter case, we injected two 15011 boluses of contrast 
agents (at the beginning of the ultrafast acquisition and in the middle, 5 min) to 
avoid a drop in the microbubble concentration due to the dynamic of the boluses. 
The backscattered echoes were recorded, beamformed with A-line spacing and 
coherently added to produce an echographic image at each transmission. Successive 
raw images corresponding to three different transmission angles at 1,500 Hz PRF 
are then coherently added to produce one higher-contrast ultrasonic image for 
each set of tilted angles at a 500 Hz frame rate. 
Data treatment for bubble localization. High-pass spatiotemporal filtering was 
implemented on the stack of the ultrafast images to discriminate the high temporal 
components, belonging to the blood signal, from the slow-moving tissue. Next, 
the stack of filtered ultrafast acquisition was rescaled via interpolation, yielding 
super-resolved output images with a pixel size of 10j1m x 81m. Since the bubbles 
are much smaller than the wavelength (1-3 |1m versus 100,1m) and can be individ- 
ually separated in space and time, they appear as the PSF of the ultrasound system. 
This PSF is well behaved with respect to the theory of acoustic diffraction because 
human and animal soft tissues can be considered homogeneous for acoustic prop- 
erties at first-order approximation™”. 

Thereafter, we computed a Gaussian low-pass spatial filter and extracted a 
two-dimensional PSF for deconvolution of the rescaled ultrafast acquisitions. Hence, 
each individual bubble was localized, across all frames in the axial position and in 
depth, with a Gaussian two-dimensional profile whose summit represents the cen- 
troid of each separable source (Extended Data Fig. 1). Only 50% of the maximum 
of the full-width at half-maximum was kept to reconstruct the density maps of the 
bubbles; such thresholding helped cancel unwanted noisy signals. Additionally, to 
avoid any artefact corresponding to independent neighbouring bubble events, only 
bubbles that could be followed for at least 2 ms were included. Eventually the bubbles 
were counted and grouped according to their closeness. Almost 1.2 million bubbles 
were counted in the rat cortex within 74,800 frames. Supplementary Video 1 shows 
the reconstruction of each vessel through the passage of individual microbubbles. 

A displacement vector was drawn between these positions, enabling the evalu- 
ation of the instantaneous in-plane velocities of the bubbles, computed as the rate 
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of displacement from one frame to the next frame divided by the time interval. 
Only tracks composed by more than 5 frames (10 ms) were considered to eval- 
uate the velocities. Coloured velocity maps were constructed using the bubble 
paths associated with their in-plane velocities (Fig. 2d). More specifically, blue 
corresponds to the velocities towards the top and the red refers to the in-plane 
velocities towards the bottom. Taken separately but treated equally, the veloc- 
ity maps were exploited to retrieve the velocity profiles of each downstream 
and upstream micro-vessel. In Fig. 2e, we selected two representative vessels: 
(4) and (5), whose velocities were oriented towards the bottom and towards 
the top, respectively. We evaluated the number of bubbles in a fixed-resolu- 
tion cell (Axz), across the sections of the two chosen vessels, and extracted 
50% of the fastest bubbles. Then, we measured the mean + standard devia- 
tion of each thresholded in-plane velocity vector and performed an unpaired 
Student’s t-test. When Axz was chosen between 81m and 121m, the quan- 
tification of the velocity distribution for each resolution cell gave a result that 
was statistically different from the adjacent one (P< 0.05). Finally, ULM was 
performed to reconstruct the vascular network and quantify the velocity maps 
in the whole brain. In Extended Data Fig. 2, we show how the microvasculature 
of the brain was retrieved with high resolution in depth (11.6 mm) along dif- 
ferent coronal imaging planes (from Bregma —0.5 mm to Bregma —4.5 mm). 
Each of these ultrasound acquisitions was detached in three panels of 4mm 
depth to properly filter out the thinned skull bone. Supplementary Video 2 
shows the various coronal slices taken during the experiments. It should be 


noted that the same filter was applied to reconstruct the vasculature of the 
cortex in Figs 2a and 3a, and Extended Data Fig. 2a-i. The in-plane velocity 
maps in Extended Data Fig. 3 were attained with the same data treatment as 
Figs 2d and 3b. They enable the quantification of velocity distributions in depth 
in the whole brain, corresponding to the coronal imaging plans in Extended 
Data Fig. 3. 
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Extended Data Figure 1 | Schema of the temporal and spatial frames are separated by 44 ms (1-4). d, Computed two-dimensional PSF 
localization of unique sources. a, Stack of B-mode images. The region of the rescaled and filtered ultrafast acquisitions. These echoes are then 

of interest corresponds to a region of 2mm x 1.1 mm within the cortex. interpolated and the Cartesian coordinates of their centre is obtained 

b, Spatiotemporal filtering of the B-mode images shows the presence of (1-4). The summit of each two-dimensional Gaussian profile identifies the 


decorrelating microbubbles in each frame (1-4). c, The four representative centroid of each separable source. 
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Extended Data Figure 2 | uULM coronal scan (anterior—posterior) of vascularization of the rat brain at the following coordinates: Bregma 

the entire rat brain through a thinned skull window. a-i, The —0.5mm (a), —1 mm (b), —1.5mm (c), —2 mm (d), —2.5mm (e), —3 mm 
ultrasound probe was driven by a micro-step motor to perform uULM (f), —3.5 mm (g), —4mm (h), —4.5mm (i). 

on different imaging planes separated by 500 um. We reconstructed the 
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Extended Data Figure 3 | Anterior—posterior scan of in-plane velocity maps of the rat forebrain through a thinned skull window. a-i, Velocity maps 
for the different coronal planes presented in Extended Data Fig. 2. 
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Extra adsorption and adsorbate superlattice 
formation in metal-organic frameworks 


Hae Sung Cho!, Hexiang Deng?**, Keiichi Miyasaka!*, Zhiyue Dong, Minhyung Chol, Alexander V. Neimark*, Jeung Ku Kang, 


Omar M. Yaghi!®° & Osamu Terasaki!” 


Metal-organic frameworks (MOFs) have a high internal surface 
area and widely tunable composition’, which make them useful 
for applications involving adsorption, such as hydrogen, methane 
or carbon dioxide storage*°. The selectivity and uptake capacity 
of the adsorption process are determined by interactions involving 
the adsorbates and their porous host materials. But, although 
the interactions of adsorbate molecules with the internal MOF 
surface!°-!” and also amongst themselves within individual 
pores!®-”” have been extensively studied, adsorbate-adsorbate 
interactions across pore walls have not been explored. Here we 
show that local strain in the MOF, induced by pore filling, can give 
rise to collective and long-range adsorbate-adsorbate interactions 
and the formation of adsorbate superlattices that extend beyond 
an original MOF unit cell. Specifically, we use in situ small-angle 
X-ray scattering to track and map the distribution and ordering 
of adsorbate molecules in five members of the mesoporous MOF- 
74 series along entire adsorption-desorption isotherms. We find 
in all cases that the capillary condensation that fills the pores 
gives rise to the formation of ‘extra adsorption domains’—that is, 
domains spanning several neighbouring pores, which have a higher 
adsorbate density than non-domain pores. In the case of one MOF, 
IRMOF-74-V-hex, these domains form a superlattice structure 
that is difficult to reconcile with the prevailing view of pore- 
filling as a stochastic process. The visualization of the adsorption 
process provided by our data, with clear evidence for initial 
adsorbate aggregation in distinct domains and ordering before an 
even distribution is finally reached, should help to improve our 
understanding of this process and may thereby improve our ability 
to exploit it practically. 

Figure 1 shows the three distinct types of interaction in which 
adsorbates in MOFs can engage: adsorbate molecules can interact 
with the material’s internal surface (regime A); adsorbates can inter- 
act among themselves within the confines of a pore (regime B); and 
adsorbates can interact with each other across pores mediated by the 
material framework (regime C). Studying the collective adsorbate 
behaviour in regimes B and C requires porous MOF crystals, with 
pores that are large enough to enable the organization and behaviour 
of confined adsorbates to be observed, and with pore walls that are 
atomically thin and well-defined so as to allow observation of any 
local perturbations resulting from adsorption. In such systems, we can 
then use in situ small-angle X-ray scattering (SAXS) to detect long- 
range ordering of adsorbates in multiple pores at precisely controlled 
temperatures and pressures. 

We chose the five mesoporous MOFs with isoreticular structure 
(IRMOF-74-III, IRMOF-74-IV, IRMOF-74-V, IRMOF-74-V-hex 
and IRMOF-74-VII) that are based on the crystalline IRMOF-74 


structure®?-*°, The robustness of the IRMOF-74 honeycomb-like 
structure (in projection) is imparted by one-dimensional, rod-shaped 
magnesium oxide units that run along the pore direction and are held 
together by organic linkers (Fig. 2a). This rigid oxide unit allows for 
structural refinements in two dimensions, by keeping constant the 
structure along the c axis of the original MOF structure® (Fig. 2b). 
Thus, we apply the projected symmetry of the two-dimensional space 
groups (plane groups) p3 or p6 for the unit cell (Fig. 2b, green paral- 
lelogram). We therefore need only two variables, h and k, to specify 
the reflections with the h and k indices for the refinement. This allows 
us to focus on the adsorption region, and stops us from having to deal 
unnecessarily with the more complicated original symmetry R3 in 
IRMOF-74-IV, IRMOF-74-V and IRMOF-74-VII, or R3 in IRMOF- 
74-II] and IRMOF-74-V-hex (Fig. 2b, red parallelogram). 

All of these MOFs exhibit open porosity and have mesopores with 
sizes of 22 A, 28 A, 35 A and 49 A (for IRMOE-74-IIL, IRMOF-74-IV, 
IRMOF-74-V and IRMOF-74-VII, respectively). IRMOF-74-V-hex, 
having a pore size of 34 A, was constructed with a linker functionalized 


«<» Adsorbate-wall interaction 
«<» Adsorbate-adsorbate interaction 
within an individual pore 
~~» Adsorbate-adsorbate interaction 
across adjacent pores 
mediated by framework 


Adsorbates 


—MOF 


Pore 


Figure 1 | Three adsorbate-interaction regimes in mesoporous MOFs. 
In regime A, adsorbed molecules interact (green arrows) with pore walls. 
In regime B, adsorbates interact amongst each other (blue arrows) within 

a pore. These two types of interaction and the corresponding regimes 

have been well studied. Regime C, however, has not been explored; here, 
adsorbates interact with each other (red arrows) across pore walls, in a way 
that is mediated by the framework. Light blue, molecules adsorbed onto 
the internal pore surface; yellow, molecules in the centre of the pores. 
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Figure 2 | Structure of the IRMOF-74 series in three and two 
dimensions. a, The honeycomb-like structure (in projection) of MOFs 
of the IRMOF-74 series is imparted by one-dimensional, rod-shaped 
magnesium oxide secondary building units, held together by organic 
linkers (III, IV, V, V-hex or VII). b, Green dashes show the two- 
dimensional unit cell, corresponding to plane groups p3 and pé6, that we 


with a hexyl chain (V-hex). The atomically thin walls of the pores in 
these MOFs and their large pore sizes are key factors in their suitability 
for examining the collective behaviour of the adsorbates within and 
across the pores (regimes B and C, Fig. 1). 

In contrast to other in situ adsorption studies, performed using a 
synchrotron beamline!®!”%, we used a laboratory-designed SAXS 
set-up operating in transmission mode with a rotating anode X-ray 
source, a graded confocal optic, and a Kratky block system to create a 
monochromatic beam focusing on the detector. Incorporation of an 
adsorption apparatus in the SAXS system enables measurement of 
both X-ray-diffraction profiles and gas-adsorption isotherms from 
the same sample at a precisely controlled temperature and adsorbate 
pressure (see Supplementary Information, section 1). We illustrate 
SAXS-based adsorption tracking for argon uptake by IRMOF-74-V- 
hex, for which the adsorption process can be divided into stages 1 to 5, 
taking place within the pressure ranges 0 to 0.5 kPa, 0.5 to 27 kPa, 27 to 
33 kPa, 33 to 50 kPa and 50 to 100 kPa, respectively (Fig. 3a). Although 
the shape of the isotherm is similar to that of a type IV isotherm (as 
classified by the International Union of Pure and Applied Chemistry), 
typical for mesoporous materials, the distinct slopes seen in stages 
4 and 5 point to two major differences. To understand the origin of 
slopes, we measured SAXS profiles along the entire adsorption curve, 
among which 11 different gas pressures (Fig. 3a) were selected to rep- 
resent the different stages of the argon adsorption process (Fig. 3b). 

The electron distribution of argon atoms introduced into the pores 
was obtained from 54 (7 independent) reflections, using difference 
Fourier analysis of the measured intensity profile of the argon-filled 
MOF and the calculated intensity profile of the corresponding acti- 
vated MOF without argon (Fig. 3d; see also Supplementary Figs 4 
and 5 and Tables 1-4). (Note that, although we cannot determine 
the argon distribution with atomic resolution owing to limitations 
imposed by the maximum q range (the largest angle that can be 
detected; 4msin@/A) of the SAXS instrument, the resolution is suffi- 
cient to map the electron-distribution trend within the large pores; 
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applied for structural refinement. Red dashes shown the J3 x J/3 unit cell, 
which corresponds to the projection of the original space groups R3 and 
R3, used to reveal adsorbate distribution. c, The 2 x 2 superlattice cell 
(purple dashes) and the pores in violet contain a larger number of 
adsorbates, compared with the surrounding pores in blue. 


an exception is the centre point, which might be slightly affected 
by the termination effect in the Fourier synthesis.) The adsorbate 
electron-distribution maps (Fig. 3d) reveal that, as expected and in 
agreement with previous findings’®, argon interacts strongly with the 
open metal sites of the magnesium oxide units in stages 1 and 2. The 
electron-density map of argon at 27 kPa (Fig. 3d) and correspond- 
ing electron density distribution profile (Fig. 3c) show two to three 
cylindrical layers of argon atoms adsorbed onto the walls (regime A in 
Fig. 1). This is followed by argon condensation in the pores, which 
commences at stage 3 and is accompanied by a steep increase in gas 
uptake (Fig. 3a, d). At roughly midway through stage 3, the corre- 
sponding hk = 10 reflection intensity decreases sharply (Fig. 3b; 
Supplementary Fig. 10), while a new broad peak at q=0.10 A“! 
emerges (marked by grey dashes in Fig. 4a)—evidence of collective 
adsorbate—adsorbate interactions). Although the pores are not yet 
completely filled, as indicated by the smeared-out electron density in 
the centre region of the pore (Fig. 3d), the emergence of this broad 
peak unambiguously represents an important point (termed the 
aggregation point) in the initiation of formation of extra adsorption 
domains, whereby adsorbate atoms gather in certain pore regions in 
higher numbers than the average. 

The intensity of this broad peak reaches a maximum as stage 3 turns 
to stage 4 (33 kPa), and then decreases gradually to eventually disap- 
pear at the end of stage 4 (Fig. 4a). Furthermore, the density of argon 
in the centre region increases more than it does around the pore walls 
(Fig. 3d, stage 4) during both the appearance and the disappearance of 
the peak. The correspondence between the appearance/disappearance 
of this peak with the characteristics of the pore-filling process indicates 
that this unusual phenomenon originates from the complex collective 
behaviour of argon: argon atoms are not equally distributed through- 
out the available pores during stages 3 and 4 of the condensation 
process, but instead exhibit density fluctuations that result in extra 
adsorption domains with a higher-than-average argon concentration 
spanning several contiguous pores. 
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Figure 3 | Mapping of argon distribution in IRMOF-74-V-hex. a, Argon 
uptake by IRMOF-74-V-hex at different gas pressures. The isotherm shows 
five stages (1 to 5), with distinct slopes. Three points (red, dark green and 
light blue) are highlighted for the start, end/start and end of two events 
unobserved in type IV isotherms. b, SAXS scattering profiles measured 
along the entire adsorption process at 11 different gas pressures, covering 
the different stages of argon adsorption. The patterns are overlaid in 

linear scale, with colours corresponding to the points in the isotherm. 


Further evidence for the formation of these domains comes from 
the adsorption profile slope in stage 4, which differs strongly from that 
in stage 3. The formation of the domains causes unit-cell contraction 
and associated broadening of the diffraction intensity profiles (hk = 10, 
11, 20, 21, 30, 22 and 31) during stage 3, and expansion of the unit cell 
and sharpening of the associated profiles during stage 4 (Fig. 4b-d). 
These effects are correlated with changes in the local strain of the MOF 
backbone, which results in the emergence of an additional stage in the 
overall adsorption process in IRMOF-74, as indicated by the full-width 
half-maxima (FWHM) of the SAXS profile peaks, and by the unit-cell 
parameters (contraction versus expansion) adopting during stage 3 
maximum and minimum values, respectively (Fig. 4c, d). In stage 4, 
the strain induced by the adsorption heterogeneity starts to smear out 
and the FWHM decreases as more argon atoms enter the pores and 
move towards a more homogenized arrangement, leading to a different 
slope for gas uptake*’. We note that the changes in the FWHM and 
unit-cell parameters of IRMOF-74 during and after mesopore con- 
densation resemble those accompanying gas adsorption in MCM-41 
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40 kPa 
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Homogenization point 
50 kPa 
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100 kPa 


c, Three-dimensional contour map of the electron-density profile of argon 
at 27 kPa. The mesopores are covered by argon at this point. d, Projected 
argon distribution in two dimensions from the three-dimensional contour 
maps. Each two-dimensional map reveals the argon distribution within the 
MOF structure at a certain pressure. The red lines at 27 kPa indicate the 
argon profile projection in two directions (metal site to metal site, and wall 
to wall). 


(a typical mesoporous silica with relatively thick pore walls’’), but 
the magnitude of the changes in our system is much larger than that 
observed for MCM-41. This indicates that the adsorbates stress the 
IRMOF-74 framework, with its thinner MOF walls, more than they 
stress the more-sturdy MCM-41 framework; this also explains why 
a broad peak at low q range and a unique slope at stage 4 could not 
be observed during and after mesopore condensation for MCM-41 
(Supplementary Fig. 12). 

The fate of the extra adsorption domains in IRMOF-74-V-hex 
can be gleaned from the abrupt appearance of superlattice reflec- 
tions in the SAXS patterns (reflections at q=0.25 A~! (marked 
by grey dashes) in Fig. 4a, and at q=0.42 A“! in Supplementary 
Fig. 40) at the start of stage 4. The intensity of the reflections 
decreases as the pressure increases from 33 kPa, and becomes 
zero at 50 kPa (Fig. 4b), accompanied by a decrease in the 
FWHM of all the profile reflections. We infer from these observa- 
tions that extra adsorption domains form, and that the contrast 
between these domains and the surrounding domains increases 
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Figure 4 | Extra adsorption domains and argon adsorbate superlattice in 
IRMOF-74-V-hex. a, The appearance and disappearance of the broad peak 
(at q=0.093 A~') and the superlattice peak (at q=0.25 A~') in the SAXS 
patterns during the absorption process, with intensity magnified by four in 
the right-hand image. b, Intensity changes of 1+ and +1 superlattice 
reflections. c, Tracking of the unit-cell parameter change of IRMOF-74-V- 
hex during the adsorption process. d, Tracking of the corresponding FWHM 


during stage 3 of the adsorption process (Supplementary Fig. 11); 
moreover, we conclude that once the point of maximum contrast 
(termed the organization point) has been reached, the domains com- 
mence to form the adsorbate superlattice in stage 4. It is the superlat- 
tice formation that relieves local strain—increasingly so as the contrast 
lessens and as more adsorbates fill the pores towards uniform distri- 
bution (Supplementary Fig. 11). 

The precise structure of the superlattice is determined from the 
positions of the reflections at q=0.25 A-'and q=0.42 A~|, mentioned 
above, indexed by hk= 15 and >I and corresponding to an adsorbate 
superlattice structure with a 2 x 2 unit cell (Fig. 2c, purple parallelo- 
gram, and Fig. 4e). Although this structure also gives a reflection of 
50 (q=0.093 A~!) that overlaps with the broad peak, the position and 
intensity of the two observed peaks rule out the possibility of 
a./3 x J3 superlattice (Fig. 2b, red parallelogram, and Fig. 4e) that 
might form through modulation of the MOF structure. In such a case, 
the corresponding ordered reflection would have appeared at 
q=0.29 A“! for hk =; 3 however, these were absent in the SAXS 
patterns. Note also that the line-widths of the adsorbate superlattice 
reflections are much larger than those of fundamental reflections, 
further suggesting that the origin of these extra peaks is not associated 
with the framework lattice. Detailed analysis of the FWHM revealed 
that the size of the superlattice domains is about 400 A. 

Upon further increases in the argon pressure, the extra adsorption 
domains and superlattice reflections disappear at the end of stage 4 
(Fig. 3a). During the next stage (stage 5), the adsorption isotherm 
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domains (aggregation point, 30 kPa) and superlattice (organization point, 
33 kPa) formed as a result of argon being distributed unevenly among 
adjacent mesopores. Green, red and purple dashes indicate the original 
MOE, ./3 x J3, and 2 x 2 unit cell, respectively. The size of the argon 
superlattice domain (dark blue dashes at 33 kPa) is about 400 A. 


shows a new slope, and the electron density in the centre region of 
the pores gradually increases (Fig. 3d, stage 5) and leads to a slight 
unit-cell expansion (Fig. 4c) to accommodate more incoming argon 
atoms in a uniform manner among different pores. This changeover 
point in the isotherm (termed the homogenization point) marks the 
initiation of uniform pore expansion: the adsorbate superlattice disap- 
pears and homogenization of the adsorbate density takes place without 
involvement of the long-range adsorbate—adsorbate interactions that 
are mediated by local strain in the MOF framework. In terms of the 
amount of argon uptake, stages 4 and 5 account for up to 22% of the 
total uptake in IRMOF-74-V-hex. 

An overview of how different SAXS characteristics document 
the different stages in the overall adsorption process is provided in 
Extended Data Fig. 1. The desorption process of argon in IRMOF- 
74-V-hex—which was also carefully studied (Fig. 3a) and compared 
with the adsorption process in detail (Supplementary Figs 7-9, 
Supplementary Tables 5 and 6, and Supplementary Video)—involves 
the same stages as those seen during adsorption. 

The broad peak that is seen at low q values was observed in all 
IRMOFs for all three adsorbates studied (argon, nitrogen and car- 
bon dioxide) (Supplementary Fig. 14). During stage 3, this peak 
was observed in the SAXS intensity profiles at q=0.12 A~! and 
q=0.094 A~! for IRMOF-74-IV and IRMOF-74-V, respectively. 
From the distance distribution function derived from the SAXS 
data in the q range of 0.016 A“! to 0.18 A“! for IRMOF-74-IV, and 
0.016 A~! to 0.16 A“! for IRMOF-74-V and IRMOF-74-V-hex, the 
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maximum size of individual extra adsorption domains is calculated 
to be approximately 60 A for IRMOF-74-IV, and 70 A for IRMOF- 
74-V and IRMOF-74-V-hex. Although extra adsorption domains were 
seen in all IRMOF-74 compounds during the pore-filling process, the 
intensity of the additional reflections that are attributed to superlattice 
formation was negligible in the case of IRMOF-74-IV and IRMOF- 
74-V. The hexyl chains of IRMOF-74-V-hex thus seem to be important 
in superlattice formation, although pore size will also be relevant (as 
superlattices were not detected in IRMOF-74-VIL, where hexyl] chains 
are present but within the confines of larger pores). 

The changes in the SAXS profiles seen during adsorption and deso- 
rption of all three adsorbates follow similar patterns (Supplementary 
Information, sections 2-5). Intriguingly, we also find that each of 
the three adsorbates desorbs at a different pressure, and that this 
adsorbate-specific desorption pressure is, to a first approximation, 
independent of the exact nature and pore size of the IRMOFs tested 
(Supplementary Tables 5, 9 and 11). This observation is another clear 
piece of evidence that adsorbate-adsorbate interactions within and 
across adjacent pores play a major role in gas uptake and release, both 
at the outset of the desorption process and in the formation of extra 
adsorption domains. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Synthesis of IRMOF-74 series. Organic linkers were synthesized as reported 
previously*. IRMOF-74 samples were synthesized by combining organic linkers 
with Mg(NOs3), in a solution of dimethylformamide, ethanol and water, and then 
heated in an oven at 120°C for 24 hours*. Needle-shaped crystals clustered in 
spherical forms were obtained. These IRMOF-74 samples were evacuated after 
solvent exchange with methanol nine times in three consecutive days to remove 
guest molecules. 

In situ gas adsorption SAXS measurement. The in situ SAXS measurements for 
Ar, CO; and N> adsorption by IRMOF-74s (IIL IV, V, VI and V-hex) were per- 
formed using a SAXS instrument (BioSAXS-1000; Rigaku, USA) equipped with 
a rotating anode X-ray source (FR-E+ Super Bright; Rigaku, Japan) and a gas 
adsorption instrument (BELSORP-max) together with a specially designed cell on 
a cryostat (Bel, Japan). We incorporated a sample cell inside the SAXS instrument, 
with a small chamber connected to the gas adsorption instrument placed outside. 
In addition, we used a large area detector combined with copper Ka radiation from 
a rotating-anode X-ray source to provide precise measurement of both the inten- 
sity and the position of the diffraction peaks within a wide q (=4nsin6/A) range, 
from 0.01 to 0.71 A~!. Measurements were carried out with copper Ka radiation 
in the transmission mode with Confocal Max Flux Mirror, a two-dimensional 
Kratky block and a Pilatus-type detector in the SAXS instrument. The powder 
samples were mounted in two places next to each other at the same adsorbate 
environmental condition: one was within the hollow part of the stainless steel 
rectangular plate covered by polyether ether ketone (PEEK) polymer films, in the 
X-ray path for diffraction; the other was for improving accuracy in measuring gas 
adsorption/desorption isotherms. The assembled samples were connected ther- 
mally to the temperature-controlling cryostat system, where the temperature is 
controlled within + 0.01 K, and to the gas adsorption instrument. The position of 
the sample cell was adjusted to the X-ray pathway within the chamber of the SAXS 
instrument at low temperatures before starting to take measurements. 

A known weight (~0.03 g) of the IRMOF-74-III, IRMOF-74-IV, IRMOF-74-V, 
IRMOF-74-VII and IRMOF-74-V-hex samples was mounted in the sample cell 
and activated at 373 K for 6 hours under vacuum (~0.01 Pa) to remove the guest 
molecules before a series of measurements was taken. Activation of the IRMOF- 
74s was confirmed by comparing the argon isotherm of these MOFs at 87 K and 
CO, isotherm at 273 K with those of the activated sample measured on the tradi- 
tional adsorption instrument (Supplementary Fig. 3). Gases (Ar, N2 or CO?) were 
introduced into the sample cell under measurement temperatures; the gas pressure 
was changed, and then maintained for 5 min after the system reached equilibrium 
(we judge the system to have reached equilibrium if the pressure fluctuation is 
less than 1 Pa for 5 min, which took roughly 30 min), for each measurement. The 
SAXS instrument was synchronized to the gas adsorption measurement and each 
SAXS pattern was collected at each equilibrium point of the sorption isotherms 
(the exposure time for each measurement was 30 minutes). There was no pressure 
change after the SAXS measurement, confirming that the sample with adsorbates 
in the sample cell was at equilibrium. 

Before the actual adsorption/SAXS measurement started, gas adsorption 
without SAXS measurement was performed to confirm the adsorption curve 


and to set up the SAXS measurement points. We then collected SAXS scattering 
profiles at each of the 24 equilibrium points in the adsorption process, includ- 
ing the initial point (in vacuum). Another 21 profiles were collected for the 
desorption process. No transformation in the structure of the backbone of 
IRMOF-74 occurred throughout the whole gas adsorption process, as con- 
firmed by the absence of obvious changes in peak positions in these SAXS pat- 
terns. Moreover, the samples did not show structural differences after in situ gas 
adsorption SAXS measurement, confirmed by adsorption data and SAXS data 
in the vacuum. 

Structural analysis. For the structural analysis of IRMOF-74s at different gas 
pressures, Le Bail refinements”* were performed using the JANA program” over 
the full sampled angular range, on the basis of the space group R3 for IRMOF- 
74-IV, IRMOF-74-V and IRMOF-74-VII, and R3 for IRMOF-74-III and IRMOF- 
74-V-hex. The SAXS patterns of activated IRMOF-74 samples in the vacuum 
condition were refined first as a reference. The reflection peaks were modelled 
by a pseudo- Voigt peak-shape function modified for asymmetry, with six refin- 
able coefficients. The background was treated using a Legendre polynomial with 
six refinable parameters. Because the q range for the SAXS instrument could 
cover only hk0 reflections owing to the small unit-cell parameter c, only unit-cell 
parameter a was refined. The standard deviations of all data were derived from 
comparison of observed points in SAXS profiles with corresponding ones cal- 
culated after Le Bail refinement. The atomic coordinates for all MOF samples 
were adopted from the framework structures derived from single-crystal X-ray- 
diffraction data* (Supplementary Tables 1-4). Because the number of reflections 
is limited, framework atomic coordinates were fixed for all data with different 
gas pressures. The distribution of adsorbates was calculated by difference Fourier 
analysis between the observed intensity and calculated intensity after a careful 
check of phase relationships among different reflections, and visualized using 
the VESTA program*”. The calculated intensity was derived from the atomic 
coordinates obtained from single-crystal X-ray-diffraction analysis of MOF 
structures and the atomic coordinates were fixed for different gas pressures. The 
correct phase relationship of the crystal-structure factors between different 
reflections was verified by the fact that we could observe electrons at the open 
metal sites at the beginning of gas uptake. Electron-density-map data were illus- 
trated using /3 x 3 p6 cell, which is the hexagonal projected structure of R3 
and R3, in order to show clearly the electron distribution in the pores (Fig. 1b, 
red parallelogram). The level of electron density (e~ A~*) is represented in blue/ 
green/red colour code for all IRMOF-74 data. All electron-density-map data 
were presented with atomic coordinates of IRMOF-74 to clarify the relative posi- 
tion of adsorbates in the MOF. 
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Extended Data Figure 1 | The five stages of gas adsorption in 
IRMOF-74s. The five different adsorption stages are indicated in red 
at the top of the figure and their boundaries demarcated throughout all 
panels by grey dashed lines. a, The measured Ar adsorption by IRMOF- 
74-V-hex is shown; it can be compared against relevant SAXS profile 
features of IRMOF-74, measured as a function of Ar pressure, that are 
shown in the other panels. b, The appearance and disappearance of 

the broad peak indicates the formation of extra adsorption domains 
over pores (aggregation, red) and the even distribution of adsorbates 
(homogenization, blue). c, Intensity of 14 superlattice reflection, 
appearing as stage 3 turns to stage 4 (organization, green) and 
disappearing at the end of stage 4 (homogenization, blue). d, Change 
in the unit-cell parameter a of IRMOF-74. e, Change in the 

line-profile width of IRMOF-74. 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


LETTER 


OPEN 


doi:10.1038/nature15714 


Single-molecule sequencing of the desiccation- 
tolerant grass Oropetium thomaeum 
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Plant genomes, and eukaryotic genomes in general, are typically 
repetitive, polyploid and heterozygous, which complicates genome 
assembly!. The short read lengths of early Sanger and current 
next-generation sequencing platforms hinder assembly through 
complex repeat regions, and many draft and reference genomes 
are fragmented, lacking skewed GC and repetitive intergenic 
sequences, which are gaining importance due to projects like 
the Encyclopedia of DNA Elements (ENCODE)’. Here we report 
the whole-genome sequencing and assembly of the desiccation- 
tolerant grass Oropetium thomaeum. Using only single-molecule 
real-time sequencing, which generates long (>16 kilobases) 
reads with random errors, we assembled 99% (244 megabases) 
of the Oropetium genome into 625 contigs with an N50 length of 
2.4 megabases. Oropetium is an example of a ‘near-complete’ draft 
genome which includes gapless coverage over gene space as well as 
intergenic sequences such as centromeres, telomeres, transposable 
elements and rRNA clusters that are typically unassembled in draft 
genomes. Oropetium has 28,466 protein-coding genes and 43% 
repeat sequences, yet with 30% more compact euchromatic regions 
it is the smallest known grass genome. The Oropetium genome 
demonstrates the utility of single-molecule real-time sequencing for 
assembling high-quality plant and other eukaryotic genomes, and 
serves as a valuable resource for the plant comparative genomics 
community. 

The genomes of Arabidopsis’, rice’, poplar, grape and Sorghum? 
were first sequenced using high-quality and reiterative Sanger-based 
approaches producing a series of ‘gold standard’ reference genomes. 
The advent of next-generation sequencing (NGS) technologies reduced 
costs of sequencing substantially, which has enabled sequencing of over 
100 plant genomes!. The quality of plant genome assemblies depends 
on genome size, ploidy, heterozygosity and sequence coverage, but most 
NGS-based genomes have on the order of tens of thousands of short 
contigs distributed in thousands of scaffolds. The short read lengths of 
NGS, inherent biases and non-random sequencing errors have resulted 
in highly fragmented draft genome assemblies that are not complete, 
which means they are missing biologically meaningful sequences 
including entire genes, regulatory regions, transposable elements, 
centromeres, telomeres and haplotype-specific structural variations. 
It is becoming clear from ENCODE projects that complete genomes 
are needed to better understand the importance of the non-coding 
regions of genomes’. 

More than 40% of calories consumed by humans are derived from 
grasses, and the grass family (Poaceae) is arguably the most important 
plant family with regard to global food security®. The size and complex- 
ity of most grass genomes has challenged progress in gene discovery 


and comparative genomics, although draft genomes are now avail- 
able for most agriculturally important grasses’. The largest genome 
assemblies, such as maize (2,300 megabases (Mb))’, barley (5,100 Mb)® 
and wheat (hexaploid, 17,000 Mb)? are highly fragmented as a result 
of the inability of current sequencing technologies to span complex 
repeat regions. Near-finished reference genomes are available for rice’, 
Sorghum? and Brachypodium"®, but more high-quality grass genomes 
are needed for comparative genomics and gene discovery. Here we pres- 
ent the ‘near-complete’ draft genome of the grass Oropetium thomaeum, 
the first high-quality reference genome from the Chloridoideae sub- 
family. The draft genome is near complete because we were able to 
sequence through complex repeat regions that are unassembled in most 
draft genomes. Oropetium has the smallest known grass genome at 
245 Mb and is also a resurrection plant that can survive the extreme 
water stress such as loss of >95% of cellular water (Fig. 1)!!. 
Single-molecule real-time (SMRT) sequencing (Pacific Biosciences) 
produces long and unbiased sequences, which enables assembly of 
complex repeat structures and GC- and AT-rich regions that are often 
unassembled or highly fragmented in NGS-based draft genomes. We 
generated ~72 x sequencing coverage of the Oropetium genome using 
32 SMRT cells on the PacBio RS II platform (which is equivalent to <1 
week of sequencing time and <US$10,000 in reagents). The resulting 
sequence had a read N50 length of over 16 kilobases (kb), and there was 
10x coverage of reads over 20 kb in length (Extended Data Fig. 1a). The 
raw reads were error-corrected using the hierarchical genome assembly 
process (HGAP), and the longest reads (>16 kb) were assembled using 
Celera assembler followed by two rounds of genome polishing using 
Quiver!”, The assembly contains 650 contigs spanning 99% (244 Mb) 
of the estimated 245 Mb genome size (Extended Data Fig. 1b) with a 
contig N50 length of 2.4 Mb (Extended Data Fig. 1c). The final assem- 
bly consists of 625 contigs after removal of the complete chloroplast 
genome, mitochondria-derived contigs and contaminants. The 35 
largest contigs span half the genome, and the largest 107 contigs contain 
90% of the sequence. The 135,324 base-pair (bp) chloroplast genome 
assembled into a single contig that includes both ~25 kb of inverted 
repeat regions which typically collapse into a single copy during 
assembly. The mitochondria genome was assembled into 20 partially 
overlapping circular chromosomes, which are the product of 
intramolecular recombination events that collectively span 1,100 kb. 
The Oropetium genome has high contiguity for an uncurated 
draft plant genome. The average contig N50 length for all published 
plant genomes is 50kb compared to 2.4 Mb for Oropetium (Extended 
Data Fig. 1d, e). After manual curation and data augmentation, only 
the Arabidopsis (TAIR10)!3, rice (V7) and Brachypodium (V 2.1)'° 
genomes have longer contig N50 lengths. The accuracy rate is very 
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Figure 1 | Desiccation tolerance in the resurrection grass Oropetium thomaeum. a, Well watered. b, Desiccated (relative water content <5%) after 
9 days of drought stress. c, Condition 24h post-hydration (relative water content >70%). 


high at 99.99995%, which is similar to Sanger-based approaches and 
higher than most NGS-based assemblies (Extended Data Fig. 1h). 
We plotted repeat density and GC content along the length of the 
contigs to identify factors causing contig breaks (Extended Data 
Fig. 1f, g). There is no correlation between repeat density and GC 
content at contig break points. This suggests that contig break points 
occur at the start of repeats or that most assembly breaks are caused 
by other factors, such as within-genome heterozygosity or haplo- 
type-specific structural variation. To test this, we also tried ‘dip- 
loid-aware’ assemblers Falcon (https://github.com/PacificBiosciences/ 
falcon) and MinHash Alignment Process (MHAP)!*. These assem- 
blies had similar metrics but were less contiguous overall (Extended 
Data Fig. 1i). 

The completeness of the Oropetium genome allowed us to accu- 
rately survey its highly repetitive features that are often unassembled in 
most plant genomes. The Oropetium assembly captures all 18 telomeric 
arrays (Extended Data Table 1) with repeat number ranging from 40 to 
900, suggesting that at least some are full length. Three of the nine cen- 
tromeric satellites are completely assembled into large inverted repeats 
spanning 400 kb with a base monomer length of 155 bp, and higher 
order structures of dimers (310 bp), trimers (465 bp) and tetramers 
(620 bp; Fig. 2, Extended Data Fig. 2 and Supplementary Table 1). The 
remaining 40 centromeric sequences are incomplete centromere repeat 
fragments broken during assembly or solo repeats not associated with 
a larger centromere satellite. Nucleolus organizer regions contain tan- 
dem arrays of the 18S, 5.8S and 25S ribosomal RNA (rRNA) genes and 
typically span several megabase pairs with hundreds of nearly identi- 
cal 10-kb arrays. Twenty-two full-length rRNA tandem arrays in six 
contigs are found in the Oropetium assembly (Extended Data Table 2). 
The largest tandem array contains five identical and one partial 9-kb 
repeats collectively spanning 51 kb; this is approaching the theoretical 
limit given the read-length distributions of our data. The remaining 
rRNA tandem repeats probably collapsed during read correction or 
genome assembly given their high sequence conservation. 

Most repeats are incomplete, unassembled or highly collapsed in 
Illumina/454 NGS-based genomes, which has led to an underestima- 
tion and misclassification of repeat content in most plant genomes. 
Repetitive elements account for a surprisingly high proportion of the 
Oropetium genome (43%) compared to 21% in Brachypodium'®, 35% 
in rice*, 54% in Sorghum? and over 90% in wheat? (Extended Data 
Table 3). Similar to these other genomes, the long terminal repeat (LTR) 
retrotransposons are the most abundant class and account for 35.6% of 
the Oropetium genome. We identified 3,247 intact LTRs in 358 families, 
which is similar to rice (3,663) and Brachypodium (2,162), but far less 
than Sorghum (17,022)!°. Only ~2% of the repeats are unclassified, 
which reflects the completeness of individual repeat elements due to 
the long reads. 

Genome size in the grasses varies by several orders of magnitude as a 
consequence of polyploidy and genome bloating due to repetitive DNA 
accumulation’®, Oropetium has the smallest known genome among the 


grasses!” at 90%, 60%, 50%, 30% and 10% the size of Brachypodium", 
rice’, Setaria'®, Sorghum? and maize’, respectively. We found that 
Oropetium has a solo:intact LTR ratio >1, which is similar to small 
grass genomes like rice and Brachypodium, where proliferating LTRs are 
removed by illegitimate recombination, whereas large grass genomes 
like Sorghum and maize have solo:intact LTR ratios <1 (ref. 15). Despite 
its compact size, the Oropetium genome has a typical number of pre- 
dicted protein coding genes at 28,446. A pan-cereal whole-genome 
duplication (WGD) event, called rho, occurred before the diversi- 
fication of grasses>!°. There appear to have been no further WGDs 
in the selected grass genomes, including Oropetium, since the shared 
rho event*”. 

Genome alignments between Oropetium and selected grass genomes 
are mostly one-to-one after exclusion of the alignments derived from 
the shared genome duplication events (Extended Data Fig. 3a-e). 
Overall, 75% of the Oropetium genome, or 89% of its gene space, is 
contained in conserved syntenic blocks when compared to other 
grasses. Genomic colinearity across grass genomes is extensive, with a 
high density of orthologous genes spanning much of the euchromatin 
(Fig. 3). Insertions of retrotransposons and non-collinear genes that 
originated elsewhere in the genome contribute greatly to the differences 
in the intergenic sequences in grasses”°. 

The relative sizes of syntenic blocks in the grass genomes track 
closely with the overall genome size difference (Extended Data Fig. 3f). 
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Figure 2 | SMRT sequencing enables contiguous sequencing over 
complex regions. The distributions of centromere-specific satellite DNA 
(CenOt), long terminal repeat retrotransposons (LTRs), DNA transposable 
elements (DNA-TE) and coding DNA sequences (CDS) are plotted. 

a, The gap-free assembly of a full-length centromeric array and the 
flanking highly repetitive pericentromeric region. b, The largest contig 
(7.8 Mb), which has a more typical distribution of elements. 
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Figure 3 | Compact genome structure of Oropetium. Oropetium, part 

of the PACMAD clade, provides the first high-quality reference genome 
from the Chloridoideae subfamily—a large and diverse group of ~1,600 
species that contains the orphan crops tef (Eragrostis tef) and finger millet 
(Eleusine coracana). Typical micro-colinearity patterns among genomic 


In contrast, the genomic span of coding sequences is similar across 
genes that are retained in orthologous locations, although coding fea- 
tures are slightly smaller in Oropetium (Extended Data Table 4). The 
relatively constant sizes of coding sequences among grass genomes 
confirm that genome size differences are indeed due to variations in 
the intergenic contents. It was thought that plants have a ‘one-way 
ticket to genome obesity’ due to the retention of proliferating trans- 
posable elements~!. However, analysis of carnivorous plants Utricularia 
gibba (bladderwort, 82 Mb)” and Genlisea aurea (corkscrew, 63.6 Mb)” 
provided evidence that almost all intergenic space can be purged. Small 
genomes also arise from a reduction in gene number as seen in the 
aquatic monocotyledon Spirodela polyrhiza, which has the fewest pre- 
dicted protein coding genes at 19,623 (ref. 24). Oropetium seems to have 
reduced both its intergenic and intragenic sequence. 

As the intergenic sequence in Oropetium is specifically reduced com- 
pared with other grasses (Extended Data Fig. 3f), we determined which 
sequence accounted for its smaller genome size by comparing highly 
syntenic regions of the larger 730 Mb Sorghum genome. To identify 
highly orthologous regions we looked for Sorghum genes (promoter, 
5/UTR, exons, introns and 3’UTR) with an increased number of con- 
served noncoding sequences”. We then analysed the top 48 Sorghum 
genes against their orthologous sequences in Oropetium and found 
that they were 38% (+0.27, 1s.d.) larger in Sorghum (Extended Data 
Fig. 4a). The primary driver of gene-space expansion was highly unique 
~1-kb intragenic sequences evenly spaced within the Sorghum genes. 
One explanation is that these evenly spaced highly unique sequences 
are degenerate remnants of transposons that have been partly purged 
from the Sorghum genome. Oropetium has a >1 solo:intact LTR ratio, 
consistent with active purging of transposons and complete loss of 
these regions. These results lend support to an emerging theory about 
the C-value paradox called the Genome Balance Hypothesis”®, which 
suggests that selection on gene networks and pericentromeric growth 
(centromere movement) is balanced by transposon proliferation and 
retention. Therefore, these evenly spaced highly unique sequences 
balance the 6:1 expansion of pericentromeric sequence in Sorghum as 
compared to Oropetium (Extended Data Fig. 4b). 

Desiccation tolerance was a key adaptation that permitted the 
most recent common ancestor of terrestrial plants to survive on land. 
Desiccation tolerance is widespread in bryophytes and lichens but rare in 
flowering plants, although similar mechanisms have evolved in vascular 
plants for seed and pollen desiccation. Desiccation tolerance to survive 
prolonged drought evolved independently in diverse monocotyledon 
and eudicotyledon lineages, and is found in at least 300 species. Gene 
duplications have provided the raw material for evolutionary innova- 
tion across plants. Tandem duplicated genes are often involved in stress 
responses and are probably important for adaptive evolution in dynam- 
ically changing environments. Oropetium has 6,668 tandem duplicated 
genes in 2,326 clusters, which is a slightly higher number than in other 
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grasses, but a similar proportion (24% of genes). Tandem duplicated 
genes are enriched for gene ontology terms involved in response to abi- 
otic stresses, gene regulation and cellular metabolism (Supplementary 
Table 2). In addition, Oropetium has 4,209 homeologous gene pairs 
retained from the rho WGD event, which are enriched for gene ontology 
terms related to gene regulation and stress responses such as transcrip- 
tion factor activity, nitrogen metabolism, response to abiotic stimulus, to 
salt stress and to oxygen-containing compounds (Supplementary Tables 
3 and 4). Understanding the genomic mechanisms of extreme desicca- 
tion tolerance in resurrection plants such as Oropetium may provide 
targets for engineering drought and stress tolerance in crop plants. 

Pacific Biosciences (PacBio) SMRT sequencing has been used 
to close gaps in the human genome”’, assemble complete bacterial 
genomes’” and identify novel gene isoforms”*. Here we present a several 
hundred megabase plant genome, sequenced and assembled entirely by 
SMRT sequencing. The long SMRT reads produced a near-complete 
draft genome that captured three of nine complete centromeres, all 
of the telomeres and biologically relevant features of the Oropetium 
genome. The total time from extracted DNA to a complete assembly 
was less than one month, and costs for PacBio were comparable to 
an Illumina-based genome assembly. Our study demonstrates that 
SMRT sequencing enables a new level of genome assembly required 
for full ENCODE-type analysis of intergenic sequence, which is not 
currently possible with other NGS-based methods. The compactness 
of the Oropetium genome results from purging of both inter- and intra- 
genic sequences, probably through small deletions during illegitimate 
recombination, as has been shown in other grasses. One hypothesis is 
that genome size is a function of cell size~’, and consistent with this, all 
small plant genomes sequenced to date including Arabidopsis (125 Mb), 
Brachypodium (272 Mb), Selaginella (100 Mb) Spirodela (158 Mb) and 
Utricularia (82 Mb) are plants of very small stature (Fig. 1). However, 
we provide evidence for the Genome Balance Hypothesis, which sug- 
gests that there is selective pressure on Oropetium to purge proliferat- 
ing transposons in order to maintain expression balance of networked 
genes and spacing in centromeres. The complete assembly of complex 
and highly similar repeat sequences demonstrated here suggests that 
SMRT sequencing can be used to assemble large and polyploid plant 
and other eukaryotic genomes, assuming ample sequence coverage and 
computational resources. SMRT-sequencing-based assemblies provide 
an opportunity to determine how these regions play a role in genome 
architecture and dynamics. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Plant material. Oropetium thomaeum is a compact resurrection plant that has 
the smallest known genome among the grasses, at 245 Mb and 9 chromosomes 
(2n=2x= 18; 1C=0.25pg)'”. We estimated the genome size to be 250 Mb by 
flow cytometry and 245 Mb by k-mer analysis (Extended Data Fig. 1b). Oropetium 
thomaeum plants were originally collected in Jodhpur, Rajasthan, India and prop- 
agated as previously described''. Oropetium is a member of the Chloridoideae 
subfamily, a large and diverse group of roughly 1,600 species that contains the 
orphan crops tef (Eragrostis tef) and finger millet (Eleusine coracana) as well as 
some turf grasses (such as Bermuda grass, Cynodon dactylon and Zoysia japonica). 
SMRT PacBio sequencing. Fifty micrograms of high-molecular-weight Oropetium 
gDNA was extracted using a modified nuclei preparation method”? followed by 
an additional high-salt phenol-chloroform purification to minimize contamina- 
tion. A 20-kb insert SMRTbell library was generated using a 15 kb lower-end size 
selection protocol on the BluePippin (Sage Science). Initial titration runs were 
performed to optimize loading on the SMRT Cell for maximum performance. The 
Oropetium genome was sequenced using 32 SMRT Cells with 4-h collections and 
P6-C4 chemistry on the PacBio RS II platform (Pacific Biosciences). 

HGAP genome assembly. The Oropetium genome was assembled using the 
RS_HGAP_Assembly.3 protocol for assembly and Quiver for genome polish- 
ing in SMRT Analysis v2.3.0". This consisted of a three-step process involving 
(1) generation of preassembled reads with improved consensus accuracy; 
(2) assembly of the genome through overlap consensus accuracy using Celera; and 
(3) one round of genome polishing with Quiver. For HGAP, the following param- 
eters were used: PreAssembler Filter v1 (minimum sub-read length = 3,000 bp, 
minimum polymerase read quality = 0.80, minimum polymerase read 
length = 3,000 bp); PreAssembler v2 (minimum seed length = 16,000 bp, number 
of seed read chunks = 6, alignment candidates per chunk = 10, total alignment 
candidates = 24, min coverage for correction = 6); AssembleUnitig v1 (target 
genome coverage = 30, overlap error rate = 0.06, minimum overlap = 40 bp and 
overlap k-mer = 14); and BLASR v1 mapping of reads for genome polishing with 
Quiver (max divergence percentage = 30, minimum anchor size = 12). A second 
round of genome polishing was performed using Quiver (SMRT Analysis v2.3.0) to 
further improve the site-specific consensus accuracy of the assembly. The following 
Quiver parameters were used for genome polishing: filtering (minimum sub-read 
length = 3,000 bp, minimum polymerase read quality = 0.80, minimum polymer- 
ase read length = 3,000 bp); mapping (maximum divergence percentage = 30, 
minimum anchor size = 12). Default parameters were otherwise employed for both 
HGAP assembly and Quiver protocols. 

Falcon and MHAP assemblies. We also tested other assemblers to compare the 
PacBio HGAP assembly results (Extended Data Fig. 1i). Raw PacBio reads were 
error-corrected and assembled using Falcon and MHAP under default parame- 
ters. The Falcon and MHAP assemblies have lower contiguity than the HGAP 
assembly and have fewer assembled centromere and telomere sequences with a 
lower average length. 

Construction of a genome map using the Irys system for contig anchoring 
and scaffolding. Genome mapping from BioNano Genomics”! was used to 
improve the assembly quality of the Oropetium genome with the eventual goal 
of producing a chromosome-scale assembly. High molecular weight genomic 
DNA was isolated from fresh Oropetium tissue using the following protocol 
outline. Three grams of leaves were collected from live Oropetium thomaeum 
plants and fixed with formaldehyde. After blending with a tissue homogenizer 
in isolation buffer, a filtration step and Triton-X washing treatment were per- 
formed. The nuclei were purified on percoll cushions. The nuclei were washed 
extensively and embedded in low melting agarose at different dilutions. Finally, 
the DNA plugs were treated with a lysis buffer containing detergent, protein- 
ase K and 6-mercaptoethanol (BME). In total, 53 Gb of data (>100 kb) were 
collected representing ~200x genome coverage with a molecule N50 length of 
169 kb (Extended Data Fig. 5a). The size distribution was lower than expected 
and is probably a result of impurities during high-molecular-weight gDNA 
isolation that would cause shearing and inhibition of enzymes. Molecules were 
de novo assembled as previously described**. Two genome maps were assembled 
at different stringencies, map set 1 has 402 maps with an N50 length of 725 kb and 
spans 216 Mb (Extended Data Fig. 5b); the second genome map has 214 maps and 
an N50 of 1.674 Mb. Combining the genome maps with the PacBio assembly to 
produce a hybrid scaffold was performed sequentially with the two genome maps. 
The scaffolding merged 90 contigs producing an assembly of 46 primary scaffolds 
covering 94% of the sequence assembly with an N50 of 7.8 Mb; in total there are 
535 scaffolds with an N50 of 7.1 Mb and total assembled size of 244 Mb. 

Variant calling using Illumina data. WGS Illumina sequences from Oropetium 
gDNA were used to assess the error rate of the PacBio assembly and residual 
within-genome heterozygosity (Supplementary Table 5). Raw Illumina HiSeq data 


from three different libraries of 570-bp insert, 1-kb insert and 3-kb insert sizes 
were trimmed for quality using Trimmomatic (v.0.32; ref. 33). Illumina sequence 
adaptors were removed, leading low quality (below quality 3) and N base pairs 
were trimmed, and reads were scanned using a 4-bp sliding window and trimmed 
when the average quality per base dropped below 30. Read pairs where both reads 
were ultimately of at least 36 bp in length following this quality control process 
were retained and used for subsequent analyses. 

Quality trimmed data were aligned to our assembly using BWA mem (v. 
0.7.12-r1039)*4. Duplicate alignments were marked using Picard tools v.1.104 
MarkDuplicates (http://broadinstitute.github.io/picard/). Genome Analysis Toolkit 
(v.3.3.0)*° IndelRealigner was used to perform local realignment around indels, 
followed by application of GATK HaplotypeCaller to call variants. Identified single 
nucleotide polymorphisms were filtered by depth, strand bias, mapping quality and 
read position. Identified indels were filtered by depth, strand bias and read position. 

The native error rate of raw PacBio reads is in the range of 15-20%, raising 

the possibility that residual sequencing errors may be introduced into the final 
assembly of the Oropetium genome. Homozygous mismatches are classified as 
sequencing errors, and heterozygous mismatches indicate sites of heterozygosity. 
The accuracy rate is very high at 99.99995%, and a relatively high proportion of the 
errors (two-thirds) are small insertions or deletions (indels). The accuracy rate is 
similar to those obtained with WGS Sanger approaches** and is higher than those 
reported for most NGS-based assemblies. The estimated residual within-genome 
heterozygosity for the Oropetium genome is very low at 0.087%, which probably 
contributed to the high contiguity of the assembly. This suggests that provided 
sufficient coverage, a PacBio SMRT-only approach can produce a high-quality 
complete plant genome. 
Repeat annotation. To structurally annotate repeat sequences in the Oropetium 
genome, we began by discovering repetitive elements through application of the 
REPET v.2.2 packages TEdenovo and TEannot’”. The TEdenovo pipeline compares 
the genome with itself to identify and classify repeated genomic elements. 
All-by-all alignments were conducted with NCBI-BLAST+ using default 
TEdenovo parameters. LTRharvest** was used for structural detection. During 
clustering, Grouper, Recon and Plier steps were invoked both with and without 
structural detection. Consensus building was performed using default parameters. 
During consensus detect features, repeat scout?” was invoked, and Pfam26.0 HMM 
profiles’ and Repbase (v18.08) nucleotide and amino acid databanks were used. 
Finally, consensus classification, filtering and clustering were performed using 
default parameters. 

Output from the TEdenovo pipeline was used as input to the TEannot pipeline. 
This pipeline mines the genome sequence using repeated sequences identified in the 
previous TEdenovo pipeline to produce classified non-redundant consensus repeat 
sequences along with short simple repeats, which are exported to GFF3 format. 
First, a set of perfectly matching sequences from the TEdenovo-output transposable 
elements (TE) library was selected by running a subset of the TEannot pipeline, pro- 
ducing a working reference TE library. This TE library was used in a full run of the 
TEannot pipeline. For alignment of the reference TE library, NCBI-BLAST+ was 
used, and blaster, repeat masker and censor steps were run both on the reference TE 
library and on randomized chunks. Filtering was applied using default parameters. 
Short simple repeats were identified using the crossmatch engine. Merging was 
performed using default parameters. For comparisons, Repbase (v18.08) nucleotide 
and amino acids databanks were used. Finally, filtering was applied using default 
parameters, and annotations were exported to GFF3 format. 

To classify identified repeats, non-redundant consensus repeat sequences 
as output by TEanno were annotated via PASTEClassifier v1.0 https://urgi. 
versailles.inra.fr/Tools/PASTEClassifier/README). To classify these sequences, 
Repbase (v18.08)*! nucleotide and amino acid sequences were used, as were 
Pfam v26.0 (http://pfam.xfam.org/) HMM repeat profiles. Finally, identified 
LTRs were classified as Gypsy if homology or motif evidence existed for Gypsy 
and not for Copia, classified as Copia if the opposite were true, and otherwise 
classified as unknown. 

Centromere and telomere identification. Centromeric repeats were identified 
using an approach outlined in ref. 42. Tandem repeat finder (TRF, Version 4.07b)* 
was used to find tandem repeats using the parameters ‘1 1 2 80 5 200 2000-d 
-h in order to find high order repeats. The resulting ‘dat file was transformed 
into a GFF3 file, which was used to identify telomeric and centromeric repeats. 
To identify the centromeric repeats, the largest repeat arrays (period length X 
copy number) were identified and clustered. Clustered centromeric repeat regions 
were transformed into FASTA files and aligned using clustalX to identify array 
sequence composition and orientation. The base centromere repeat was 155 bp 
dimers (310 bp), trimers (465 bp) and tetramers (620 bp) (Extended Data Fig. 2 
and Supplementary Table 1). The three largest centromeric arrays (contigs 003, 
028 and 064) were >400 kb and resolved into large inverted repeats, consistent 
with them being full length. The telomeric repeats were identified by searching 
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the ends of contigs for short (~7 bp) high copy number repeats; 18 telomeric repeat 
sequences with the monomer AAACCCT’ were identified (Extended Data Table 1). 
Transcriptome assembly. Total RNA was extracted from fresh, desiccated and 
24-h post rehydration Oropetium leaf tissues with 2 biological replicates collected 
for each tissue. RNA-seq libraries were prepared from the total RNA and bar-coded 
using TruSeq RNA Sample Prep Kits (Illumina) according to the manufacturer's 
protocol. Raw Illumina RNA-seq data from the six libraries were trimmed for qual- 
ity using Trimmomatic (v.0.32; ref. 33). Illumina sequence adaptors were removed, 
then leading low-quality (below quality 3) and N base pairs were trimmed and, 
finally, resulting trimmed reads were scanned using a 4-bp sliding window and cut 
when the average quality per base dropped below 30. Read pairs where both reads 
were ultimately of at least 36 base pairs in length following this quality control pro- 
cess were retained and used for subsequent analyses. Trinity (v.r20140717)"* was 
used to assemble quality filtered data. Assembled transcripts were aligned to our 
genome sequence using NCBI blastn v.2.2.30+ with an e-value cut-off of 1 x 10°. 
Successfully aligned transcripts were clustered at 90% identity using CD-HIT 
(v. 4.5.4), with representative sequences from each cluster retained and used to 
help parameterize gene calling. Eighty-seven per cent of the trimmed RNA-seq 
reads aligned to the Oropetium genome, suggesting that the genome is largely 
complete (Supplementary Table 5). Reads that failed to align may have been 
contaminants from other organisms. 

Gene annotation. Maker v2.31.8*° (http://www.yandell-lab.org/software/maker. 
html) was used to identify putative genes. Aligned and representative sequences 
from our transcriptome assembly were input to Maker as expressed sequence tag 
evidence. Rice and Brachypodium proteome sequences clustered at 90% iden- 
tity using CD-HIT (v. 4.5.4)*° with representative sequences from each cluster 
retained and input to Maker as multi-organismal protein homology evidence. 
The Oropetium repeat database was input to Maker as a custom repeat library. 
SNAPhmm, Augustus, and GeneMarkHMM were invoked by Maker and were 
initially trained using rice and maize. Only genes for which the encoded protein 
was predicted to contain a complete open reading frame were retained. 

On the basis of the gene annotations provided by Maker, cufflinks (v2.2.1)"’ was 
used to identify predicted genes without empirical expression evidence. Quality- 
trimmed data from all six RNA-seq libraries were input simultaneously to cufflinks, 
with results used to identify genes with and without expression. 

Protein sequences from genes predicted by Maker were functionally annotated 
using NCBI blastp v.2.2.30+ versus the NCBI non-redundant refseq protein data- 
base (http://www.ncbi.nlm.nih.gov/refseq/), versus the UniProt database**, and 
using InterProScan (v. 5.6-48.0)”. 

Finally, Maker-predicted genes were pruned based on a Maker-defined anno- 
tation edit distanced (AED) score that measures distance between the predicted 
gene and the evidence input to Maker, non-redundant (NR) annotation, Uniprot 
annotation, InterProScan annotation and expression level as output by cufflinks. 
Genes were removed that had no alignment evidence (AED = 1), no sequence 
match to either the NR or Uniprot databases, no InterProScan predicted domains 
and no expression evidence in our RNA-seq data. 

Synteny and comparative genomics. Genome data sets from Setaria, Sorghum, 
rice and Brachypodium were downloaded from Phytozome (version 9.1) and 
subject to pairwise genome alignments against the Oropetium genome. For each 
pairwise alignment, the coding sequences of predicted gene models are compared 
to each other using adaptive seeds”. Our synteny search pipeline defines syntenic 
blocks by chaining the large-scale alignment tool (LAST) hits with a distance cut- 
off of 20 genes apart, also requiring at least four gene pairs per syntenic block. 
The syntenic blocks were further screened using QUOTA-ALIGN* to retain one- 
to-one blocks and to exclude weak blocks derived from shared ancient duplications. 
The resulting dot plots were visually inspected to confirm the structural similarity 
of the Oropetium genome in relation to other genomes (Extended Data Fig. 3a-e). 

Pairwise genomic alignments, described above, combined with OrthoMCL*? 
analyses filtered to one-to-one hits were used to identify orthologous gene clusters 
between Oropetium and Sorghum, rice, Vitis and Arabidopsis. The complete 
Oropetium—Arabidopsis orthologue list was then filtered to focus on genes with 
functional data in the STRING v9.1 global Arabidopsis protein interaction 
network°?, Gene expression patterns and duplicated genes (tandem and whole- 
genome duplicates) were mapped onto this network using Cytoscape v3.1.1°* 
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to identify clusters of co-expressed and interacting duplicate genes, respectively 
(Extended Data Fig. 6). Various network statistics were calculated using 
NetworkAnalyzer™, including average number of neighbours (that is, protein inter- 
actions) and total number of isolated nodes (that is, without known interactors). 
Constructing a gene interaction network. We constructed a gene interaction 
network for Oropetium on the basis of orthologous relationships with Arabidopsis 
genes with validated interactions and expression data yielding a network with 4,421 
nodes (gene products) with 36,918 edges (interactions). This network encompasses 
most metabolic pathways including photosynthesis, core anabolic and catabolic 
processes and stress response pathways (Extended Data Fig. 6). 
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Extended Data Figure 1 | Summary of the Oropetium genome assembly 
statistics. a, Histogram of length distribution of raw P6C4 chemistry 
PacBio reads. The mean read length of the raw reads is 12,872 bp, and the 
N50 is 16,485 bp. b, Genome size estimation using k-mer distribution. 
K-mer distribution of unassembled Oropetium Illumina WGS reads. 
K-mer frequency displays a unimodal curve indicating a low rate of 
heterozygosity in the Oropetium genome. Frequency distribution suggests 
a genome size of ~245 Mb, consistent with flow-cytometry-based 
estimations. c, SMRT sequencing raw read, preassembly and assembly 
statistics. d, e, The distribution of the contig N50 length (d) and scaffold 


N50 length (e) of all published plant genomes is plotted. The average 
contig N50 length for published plant genomes is ~50 kb compared to 
2.4Mb for Oropetium. f, g, Repeat density (as a function of percentage 
repeats) (f) and GC content (g) are plotted at a scaled position along each 
contig. Each contig was divided into 5,000 sliding windows with each 
window representing 0.02% of the contig length and the averages of each 
scaled sliding window are plotted. Repeat content and GC content do 

not vary at the ends of contigs. h, Estimated accuracy of SMRT PacBio 
assembly and within-genome heterozygosity. i, Comparison of HGAP 
Falcon and MHAP PacBio assemblers. 
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Extended Data Figure 3 | Macrosynteny patterns and comparative 
genomics between the grasses. a—e, Macrosynteny of Oropetium versus 
Oropetium (a); Oropetium versus Brachypodium (b); Oropetium versus 
rice (c); Oropetium versus Setaria (d); and Oropetium versus Sorghum (e). 
f, Genome compaction in Oropetium compared to related grass genomes. 
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Syntenic block span is based on regions that show conserved synteny 
across all five genomes. Syntenic gene and coding DNA sequences span is 
based on 13,683 genes that are retained as genes in orthologous locations 
across all five genomes. The ratio compared to Oropetium is given in 
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Extended Data Figure 4 | Expansion of intragenic and pericentric annotated at the bottom in black. b, Pericentric region expansion in 
regions in Sorghum compared to Oropetium. a, A GEvo sequence Sorghum compared to Oropetium. A syntenic dot plot of the Sorghum 
similarity graphic of an Oropetium gene (upper) and its orthologous and Oropetium genomes is plotted. Oropetium contigs are ordered based 
Sorghum gene (lower). Blast hits (high-scoring segment pairs) are denoted —_ on synteny with Sorghum. Hits are coloured based on K, divergence, 
by red rectangles, and syntenic hits are connected by a red line. The with purple blocks corresponding to 1:1 orthologous regions and other 
green rectangles on the model line of Sorghum are conserved noncoding colours corresponding to retained genes from the rho and sigma WGDs. 
sequences (CNS) computed between Sorghum and rice; the expanse of Pericentric regions in Sorghum have few syntenic matches to Oropetium, 
CNS coverage defines ‘gene space. Within the oval are three CNS that suggesting that much of the expansion occurred in pericentric regions. 


may be spatially constrained. The expanded interspersed sequences are 
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in nanochannel arrays is plotted. b, Integration of the genome map with 
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Extended Data Figure 6 | Network statistics for tandem duplicated genes. a, Tandem duplicated genes in the metabolic network are shown in pink. 
b, Distribution of shared neighbours. c, The average number of neighbours. 
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Extended Data Table 1 | Telomere repeat (AAACCCT) locations and organization in the Oropetium genome 


r Start of End of Size of Teleomeric _ Position of 
Contig Name conus telomeric telomeric centromeric repeat Telomere on aber ae 
coat i array array array (bp) sequence contig PIERCE AS 

Oropetium_genomic_143 99,304 4 6,446 6,445 AACCCTA start 910.1 
Oropetium_genomic_058 — 1,564,795 1,560,696 1,564,795 4,099 AGGGTTT end 580.9 
Oropetium_genomic_552 22,498 18,622 22,498 3,876 GTTTAGG end 562.9 
Oropetium_genomic_043 1,920,679 4 3,643 3,642 CCCTAAA start 515.7 
Oropetium_genomic_050 1,822,802 4 3,223 3,222 CCTAAAC start 453.3 
Oropetium_genomic_ 125 248,855 4. 3,182 3,181 AAACCCT start 452.4 
Oropetium_genomic_027 2,706,558 2 2,092 2,090 AAACCCT start 301.9 
Oropetium_genomic_169 56,172 54,243 56,170 1,927 TTAGGGT end 279.7 
Oropetium_genomic_103 526,141 524,277 = 526,139 1,862 GTTTAGG end 265.9 
Oropetium_genomic_124 262,476 260,617 262,476 1,859 TTTAGGG end 264 
Oropetium_genomic_090 736,395 al 1,601 1,600 CCTAAAC start DTA. 
Oropetium_genomic_010 = 4,141,579 4,140,107 4,141,579 1,472 TTAGGGT end 208.7 
Oropetium_genomic_076 1,024,162 4 1,169 1,168 CCCTAAA start 166.1 
Oropetium_genomic_493 25,796 24,869 25,795 926 GGGTTTA end 129.9 
Oropetium_genomic_136 = 153,270 = 152,446 ~—- 153,270 824 GTTTAGG end 119.1 
Oropetium_genomic_155 63,826 63,040 63,826 786 TTTAGGG end 110.4 
Oropetium_genomic_019 3,122,409 1 347 346 AAACCCT start 48 
Oropetium_genomic_149 80,145 1 294 293 AAACCCT start 40.4 
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Extended Data Table 2 | rRNA tandem array locations and organization in the Oropetium genome 


Size of 
Contig coe ot ag NOR Position wl 
Contig Name Size tandem ofNORon 
(bp) array array annay caanite tandem 
(bp) (bp) (bp) repeats 
51,716 er 
Oropetium_genomic_182 : il 51,716 51,716 contig 5.7 
; : spans 
Oropetium_genomic_265 38,885 1 38,885 38,885 cantix 43 
: . spans 
Oropetium_genomic_168 56,772 1 56,772 56,772 contig 63 
Oropetium_genomic_192 48,530 1 42,860 42,860 start 4.7 
Oropetium_genomic_214 44,298 31,977 44,298 12,321 start 1.3 
Oropetium_genomic_539 23,633 1 20,975 20,975 start 2.3 
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Extended Data Table 3 | Repeat annotation of the Oropetium genome 


Number _- Percent 


Repeat Class of Base Pairs 
Elements Covered 
Retrotransposon 214,698 35.60% 
Long terminal repeat (LTR) 107,010 25.50% 
Gypsy (RLG) 83,872 21.80% 
Copia (RLC) 18,223 36.90% 
Penelope (RPX) 1,548 0.15% 
Unknown LTR (RLX) 3,367 0.44% 
LINE (RIL) 17,399 1.90% 
SINE (RSX) 2,735 0.07% 
DIRS (RYD) 5,098 3.00% 
Unknown retrotransposon 
(RXX) P 82,456 7.50% 
DNA transposon 69,217 8.50% 
Maverick (DMX) 68 0.01% 
TIR (DTX) 41,930 6.60% 
Unknown DNA transposon 
(DxX) P 27,219 1.90% 
No category 7,902 1.00% 


Total 291,817 43.80% 
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Extended Data Table 4 | Comparisons of repeats and coding features in the monocotyledons 


Transcript statistics Exon statistics Intron statistics 
Common chr, 8enome repeat Avg, Median ae Mec Avg Median Avg Median 
wane Species name ‘ size # Gene # Len th Lensth Num Num Count Len th Lensth Count Len th Lenath 
(Mb) 8 et Exons Exons et gt gt 8 
Greater Spitodels 20 150 23 19,519 4,718 3,015 5.22 3 101,867 222 129 82,368 757 202 
duckweed __ polyrhiza 

Oropedium Oropetium 9 250 43 28446 2,729 1,928 4.55 3 129,421 210 126 100,975 446 168 
thomaeum 

brachy Braciypodient: = 272 21 42,868 3,819 3,128 5.38 4 154,738 254 137 120,380 402 142 
distachyon 

rice Oryza sativa 12 403 35 66,338 3,191 2,701 4A 3 238,247 331 162 177,497 389 166 

setaria Setaria italica 9 510 40 29,448 3,299 2563 4.96 3 134,802 261 137 106,488 436 145 

sorghum oc 10 818 62 40,599 2,745 2189 4.74 3 160,151 252 140 122,497 326 133 

corn Zea mays 10 ~—-.2,300 85 63,540 4,236 2,747 4.6 3 203,643 238 133. 149,177 670 154 
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Sweet and bitter taste in the brain of awake 


behaving animals 


Yueqing Peng!?°, Sarah Gillis-Smith'?*, Hao Jin!??, Dimitri Trankner!*4, Nicholas J. P. Ryba° & Charles S. Zuker!*3-4 


Taste is responsible for evaluating the nutritious content of food, 
guiding essential appetitive behaviours, preventing the ingestion of 
toxic substances, and helping to ensure the maintenance of a healthy 
diet. Sweet and bitter are two of the most salient sensory percepts 
for humans and other animals; sweet taste allows the identification 
of energy-rich nutrients whereas bitter warns against the intake of 
potentially noxious chemicals’. In mammals, information from 
taste receptor cells in the tongue is transmitted through multiple 
neural stations to the primary gustatory cortex in the brain. Recent 
imaging studies have shown that sweet and bitter are represented 
in the primary gustatory cortex by neurons organized in a spatial 
map**, with each taste quality encoded by distinct cortical fields’. 
Here we demonstrate that by manipulating the brain fields 
representing sweet and bitter taste we directly control an animal’s 
internal representation, sensory perception, and behavioural 
actions. These results substantiate the segregation of taste qualities 
in the cortex, expose the innate nature of appetitive and aversive 
taste responses, and illustrate the ability of gustatory cortex to 
recapitulate complex behaviours in the absence of sensory input. 

In mice, sweet and bitter activate cortical fields in the insula (taste 
cortex) that are separated topographically by approximately 2mm 
(ref. 4) (Fig. la and Extended Data Fig. 1). We hypothesized that if 
these cortical fields represent sweet and bitter percepts, their direct 
activation would evoke ‘bitter and sweet sensation’ even in the absence 
of an actual bitter or sweet stimulus. To optogenetically control activa- 
tion of the gustatory cortex, we introduced channelrhodopsin° (ChR2) 
to the insula of wild-type mice by stereotaxic injection of adeno- 
associated virus (AAV) targeted to either the bitter or the sweet corti- 
cal field (see Fig. 1a, b, Extended Data Fig. 1, Supplementary Table 1 
and Methods for details). Single unit recordings of the insular cortex 
of transduced animals demonstrated that photostimulation evoked 
reliable neuronal firing that was phase locked to light delivery (Fig. 1c 
and Extended Data Fig. 1b). 

We reasoned that optogenetic activation of the sweet cortical field 
should trigger behavioural attraction, whereas stimulation of the bit- 
ter field should cause strong behavioural avoidance. We used a place- 
preference test® where animals expressing ChR2 in the sweet cortex 
were introduced to a two-chamber arena in which presence in one 
of the two chambers was coupled to optogenetic stimulation, in the 
absence of any reward or punishment; we then determined the ani- 
mal’s preference index as a measure of the time spent in the chamber 
that was coupled with light stimulation. When the sweet cortical field 
was stimulated, animals developed strong preference for the chamber 
coupled to ChR2 stimulation (Fig. 1d and Extended Data Fig. 2). This 
preference could be transferred to either side of the arena by switching 
the chamber coupled to the laser stimulation of sweet cortex (Fig. 1d, 
compare chamber 1 versus chamber 2). When the same sets of experi- 
ments were performed in animals expressing ChR2 in the bitter cortical 
field, mice now displayed a range of unconditioned aversive behaviours 


(see next section), and after just a few sessions strongly avoided the 
chamber linked to photostimulation (Fig. le). Mice injected with a 
control AAV expressing enhanced green fluorescent protein (AAV- 
eGFP construct) exhibited no significant place preference after laser 
stimulation of either the sweet or bitter cortical fields (Extended Data 
Fig. 2b). Together, these observations demonstrate that neurons in the 
sweet and bitter cortical fields drive attractive and aversive responses, 
respectively. 

Next, we examined if activation of the bitter and sweet cortical fields 
evokes classical taste behaviours’. We hypothesized that optogenetic 
activation of the bitter cortical field should trigger strong light-depend- 
ent suppression of licking, while activation of the sweet cortical field 
should trigger appetitive responses. 

We used a behavioural test where motivated animals (thirsty) were 
trained to lick water in response to a combination visual/tone cue in a 
head-restrained set-up® (see Methods). We then subjected the trained 
animals expressing ChR2 in the bitter cortical field to testing sessions 
consisting of a series of water-only trials, but in half of the trials the 
bitter cortical field was stimulated upon contact of the tongue with 
the water spout. 

During the entire session we imaged (facial features), recorded, and 
measured licking responses. Figure 2 demonstrates that when the bit- 
ter cortical field was stimulated, there was a dramatic suppression of 
licking behaviour (see also Supplementary Video 1), with the animal's 
response closely following the ChR2 activation of the bitter cortex. 
Notably, after strong laser stimulation (10-20 mW), the animals dis- 
played prototypical taste rejection orofacial responses, sometimes 
including gagging (gaping’), and attempts to clean and rid the mouth 
of the non-existent bitter tastant (Supplementary Video 1; see legend 
for details). 

What about the sweet cortical field? A characteristic feature of 
sweet taste is that non-thirsty animals remain robustly attracted to 
sweet solutions, even though they exhibit limited interest for water!”. 
Therefore, we predicted that a mildly water-satiated animal express- 
ing ChR2 in the sweet cortical field would still show little attraction 
for water in control trials (referred to as off-trials), but would exhibit 
significantly enhanced licking during water trials coupled to laser 
stimulation of the sweet cortical field (referred to as on-trials). 
Importantly, the experiment was set up such that the laser shutter was 
under contact-licking operation, so the animal had control of its own 
stimulation during the on-trials, and therefore only persistent licking 
(self-stimulation) would continue to activate the sweet cortex. Our 
results demonstrate that animals aggressively self-stimulated during 
on-trial sessions, with ChR2 activation of the sweet cortical field radi- 
cally increasing licking behaviour, even though the spout still delivered 
only water, as in the off-trials (Fig. 2b, d). 

Just as a lot of sugar can ‘mask a bitter tastant, we hypothesized that 
strong activation of the cortical field representing sweet taste might be 
capable of overcoming the natural aversion to an orally applied bitter 


1Howard Hughes Medical Institute, Columbia College of Physicians and Surgeons, Columbia University, New York, New York 10032, USA. *Departments of Biochemistry and Molecular Biophysics, 
Columbia College of Physicians and Surgeons, Columbia University, New York, New York 10032, USA. 3Department of Neuroscience, Columbia College of Physicians and Surgeons, Columbia 
University, New York, New York 10032, USA. 4HHMI/Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, Virginia 20147, USA. 5National Institute of Dental and Craniofacial Research, 
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Figure 1 | Place preference by photostimulation of the sweet and bitter 
cortical fields. a, Sample injection of reporters in stereotactic coordinates 
defining the sweet and bitter cortical fields. Top: sweet cortex labelled with 
AAV-GFP and bitter cortex with AAV-TdTomato; bottom: a horizontal 
section. See Extended Data Fig. 1 for additional data. b, Coronal section 

of a mouse brain (bregma —0.2) stained with TO-PRO-3 (blue). Shown is 

a representative histological sample of the bitter cortical field expressing 
ChR2 fused to yellow fluorescent protein (ChR2-YFP), illustrating the 
location and trajectory (dotted lines) of the implanted guide cannula; 

IC, insular cortex. c, In vivo recording of ChR2-expressing insular cortical 
neurons in response to light stimulation (ten pulses, 10 Hz). The expanded 
traces show responses to each light pulse (blue bars below the trace). d, Left: 
representative tracking of a mouse during the 5 min preference test in a 
two-chamber arena; chamber 1 was coupled to light stimulation of the sweet 
cortical field during the training sessions. Shown are the fractions of time 
spent in each chamber. Right: quantitation of preference index before (pre-) 
and after (chamber 1) training with photostimulation of the sweet cortical 
field (n = 13 animals; Mann-Whitney U-test, P<0.003). Preference can be 
readily reversed by light stimulation in the opposing side (chamber 2, n= 6; 
P<0.02). e, Representative mouse track and quantitation of preference 
index in mice expressing ChR2 in the bitter cortical field; note significant 
aversion to the chamber coupled to photostimulation (chamber 1, n= 15; 
Mann-Whitney U-test, P<0.005); this behavioural aversion can be switched 
to the opposite chamber by re-exposure to photostimulation in chamber 2 
(n=4; P<0.03). Values are mean +s.e.m. See Extended Data Fig. 2b for 
GFP control injections. 


stimulus. Therefore, we asked whether photostimulation of the sweet 
cortical field in animals expressing ChR2 in sweet cortex could switch 
preference for an otherwise aversive tastant. Conversely, we also tested 
whether photostimulation of the bitter cortical field triggers aversion to 
an otherwise sweet, attractive tasting chemical. Our results (Extended 
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Figure 2 | Photostimulation of bitter and sweet cortical fields drives 
aversive and appetitive behaviours. a, b, Representative raster plots (left) 
and histograms (right) illustrating licking events during a 5s licking window 
in the presence (blue) or absence (open) of light stimulation of (a) the bitter 
and (b) the sweet cortical fields. The purple line at time zero indicates the 
start of each trial; the green line indicates the onset of water delivery. 

c, d, Quantitation of licking responses with and without light stimulation in 
(c) the bitter cortical field (n = 34, Mann-Whitney U-test, P<4 x 10~”) or 
(d) sweet cortical field (n = 31, Mann-Whitney U-test, P<5 x 10~°) of 
wild-type mice. e, f, Quantitation of licking responses in TRPM5 knockout 
mice (e, bitter cortical fields, n =9, Mann-Whitney U-test, P<5 x 10>; 

f, sweet cortical fields, n = 10, Mann-Whitney U-test, P=0.001). Each point 
indicates data from an individual mouse before and after photostimulation. 


Data Fig. 3) show both postulates to be correct, and highlight how 
activation of selective taste cortical fields can mask the hedonic value 
of oral taste stimulation. 

The experiments described above show that direct control of primary 
taste cortex can evoke specific, reliable, and robust behaviours naturally 
symbolic of taste responses to chemical tastants. These gain-of-function 
studies also illustrate how top-down control of the taste pathway can 
activate innate, immediate responses representing sweet and bitter taste. 

To formally demonstrate that these cortically triggered behaviours are 
innate (that is, independent of learning or experience) we performed 
similar stimulation experiments in mice that had never tasted sweet or 
bitter chemicals (TRPM5 null mice!; Extended Data Fig. 4). Indeed, 
our results (Fig. 2e, f) showed that even in animals that had never expe- 
rienced sweet or bitter taste, ChR2 activation of the corresponding 
cortical fields still triggered the appropriate behavioural response, thus 
substantiating the predetermined nature of the sense of taste. 

It has been known for a long time that decerebrated animals can 
still exhibit stereotyped attraction and aversion to sweet and bitter 
chemicals!’. This is thought to be mediated by brainstem taste 
circuits dedicated to immediate responses''’”. Therefore, to evaluate 
the necessity (and sufficiency, see next section) of taste cortex in taste 
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Figure 3 | Go/no-go taste discrimination task in head-restrained mice. 
a, Schematic and flow chart of the go/no-go taste discrimination task. Each 
trial starts with a visual cue (purple line), followed 1s later by a tone (green 
line) to alert mice to initiate licking. After sampling, mice were given 3s 

to continue to lick (go) or withhold licking (no-go) in response to the test 
tastant. For go trials, mice were rewarded with water (3s) if they chose to 
lick within the 3-s interval. For no-go trials, mice received a mild air puff 
to the eyelid if they failed to withhold licking. After the reward/penalty 
phase, the spout retracted and was cleared for the next trial; inter-trial 
intervals were 8s. b, Representative histograms illustrating recognition 
and generalization within bitters and sweets. This animal was trained 

and tested with 4mM AceK (sweet no-go) and 0.5mM quinine (bitter 

go), and then assayed with 100 mM sucrose and 10 uM cycloheximide 
(CYX). c, Quantitation in nine animals, demonstrating highly reliable taste 
recognition and discrimination. Values are mean + s.e.m. 


recognition and discrimination, we needed to design a test that bypasses 
immediate taste responses, and instead engages cortical circuits. In this 
assay (go or no-go behavioural test)'*"*, thirsty animals were trained to 
sample a test tastant from a spout, and then to report its identity either 
by licking (go) or withholding licking (no-go) (Fig. 3). This learned 
behaviour required the animal to sample the cue, recognize the tastant, 
and execute the appropriate behaviour in each trial. We trained ani- 
mals several ways, including to go to bitter and no-go to sweet, exactly 
the opposite of the innate drive. After 10-15 sessions of training (each 
consisting of 80 trials, with 40 randomly presented sweet and 40 bitter 
cues), mice were able to report the tastant’s identity with almost 90% 
accuracy (Fig. 3). To further demonstrate the selectivity of the assay and 
responses, we next tested the animals with sweet and bitter chemicals 
not used in the training phase. Given that all sweet tastants activate 
the same sweet taste receptor!>"!”, and all bitters the same class of taste 
receptor cells!8, we expected that novel sweets should also be recog- 
nized as no-go cues, whereas novel bitters should be seen as go cues. 
Indeed, animals trained with the bitter tastant quinine and the artificial 
sweetener acesulfame K (AceK) recognized and responded with similar 
accuracy to cycloheximide and sucrose, bitter and sweet tastants with 
completely different chemical structures from the training set (Fig. 3c). 

We implanted cannulae bilaterally into the bitter cortical fields of 
trained animals (Supplementary Table 1), waited 2 weeks for recovery, 
and assayed tastant discrimination in the go/no-go behavioural test 
before and after bilateral injection of a glutamate receptor antagonist 
(NBQX) to silence cortical activity'®’°. As shown in Fig. 4, silencing 
the bitter cortical fields prevented animals from reliably identifying the 
bitter tastant (see Extended Data Fig. 5 for additional examples using 
the reverse training test). In contrast, their ability to recognize sweet 
tastants remained unimpaired. Importantly, the loss of bitter taste func- 
tion was fully reversible upon washout of the drug (Fig. 4a), whereas 
injection of a saline control in the bitter cortical fields had no significant 
effect on either bitter or sweet taste sensing (Fig. 4b). We used the same 
strategy to conduct loss-of-function experiments in the sweet cortex. 
Indeed, bilateral silencing of the sweet cortical fields disrupted sweet, 
but not bitter, taste discrimination (Fig. 4c, d). As expected, animals 


514 | NATURE | VOL 527 | 26 NOVEMBER 2015 


a b Saline (bitter cortex) 
2 1S 4) Q ctatsesey “Sat 
£ g + 
8 8 
= = 
G 6 
E E 
& £ OB inde tustencientenes 
a a 
Bitter: no-go ; 
Sweet Bitter Bitter Sweet: go Sweet Bitter 
Tastants Tastants 
d Saline (sweet cortex) 
2 2 
§ = 1.0 | =< 
[oxy 
6 

E E 
ie) Ob OB cthensssvnccosannsgeessstesenes 2s 
c ~€ 0. 
a a 

: . ——_ =. — 

Bitter Sweet Sweet BInGE ge Bitter Sweet 
Sweet: no-go 
Tastants Tastants 


Figure 4 | Inactivation of the bitter and sweet cortical fields disrupts 
taste discrimination. a, Quantitation of performance ratios (see methods) 
before and after bilateral silencing (NBQX, 5 mg ml’) of the bitter cortical 
fields (n= 8); animals were trained to no-go to bitter and go to sweet. Note 
the impact in bitter taste discrimination, but no significant effect in sweet 
taste. After washout of the drug, the animal's ability to recognize bitter is 
restored. Comparable results are obtained when animals are instead trained 
to go to bitter and no-go to sweet (Extended Data Fig. 2). b, Quantitation 

of performance ratios with saline controls in bitter cortical fields; there is 

no significant effect on sweet or bitter taste (n =5; Mann-Whitney U-test, 
P=0.14). c, Quantitation of performance ratios with bilateral injection of 
NBQkX in the sweet cortical fields (n = 8). Animals were trained to no-go to 
sweet and go to bitter; note significant deficit in sweet taste, but no effect on 
bitter taste. After washout of the drug, the animal's ability to recognize sweet 
is restored. d, Saline injections in the sweet cortical fields have no significant 
effect on bitter or sweet taste (n =7; Mann-Whitney U-test, P= 0.80). 
Values are mean + s.e.m. Mann-Whitney U-test, **P<0.01, ***P<0.001. 


recovered sweet taste perception after drug washout. Taken together, 
these results substantiate the essential role of the sweet and bitter cor- 
tical fields in sweet and bitter taste recognition. 

What is the mouse sensing upon direct activation of a taste cortical 
field? Does optogenetic stimulation create internal representations that 
mimic those evoked by sweet and bitter chemicals on the tongue? If so, 
we reasoned that animals trained to recognize and report the sensory 
features of an orally provided sweet or bitter tastant (for example, in a 
go/no-go assay) should respond similarly to optogenetic stimulation 
of the corresponding cortical fields, even though the animal had never 
been trained with light stimulation. In essence, iflight and the chemical 
tastant evoke similar percepts, then light will generalize to the learned 
responses associated with the orally supplied stimulus. 

We first focused on sweet, because activation of the bitter cortical 
field evokes prototypical and highly salient orofacial responses that are 
already strongly indicative of bitter perception (Supplementary Video 1). 
We introduced ChR2 into the sweet cortical field of untrained mice 
and validated robust light-triggered appetitive responses (see Fig. 2). 
Then, the mice were trained in a go/no-go behavioural test where they 
learned to associate go with a bitter chemical and a low-salt solution 
(Fig. 5a), and no-go with sweet taste. Critically, under this test, mice 
needed to report both an aversive (bitter) and an attractive cue (low 
salt, see also Extended Data Fig. 6) in the same arm of the behavioural 
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Figure 5 | Cross-generalization between orally supplied taste stimuli 

and photostimulation of the sweet cortex. a, Representative histograms 
illustrating mouse performance during a training session in the go/no-go 
discrimination task. The mouse was trained to go to bitter (0.5 mM quinine) 
and low salt (20 mM NaCl), and no-go to sweet (4mM AceK). Note that 
both bitter (aversive) and low salt (attractive) were used in the same branch 
of the behavioural task (go) to exclude the valence as an identifier. 

b, Left: representative histograms illustrating cross-generalization between 
taste stimulation and photostimulation of the sweet cortical field. Right: 
quantitation of the responses from individual animals to quinine, AceK, salt 
and salt + light (n= 8, Mann-Whitney U-test, P <0.0002). 


test, hence removing pure valence”! as a way of identifying tastants. 
After mice performed at or above 80% accuracy (Fig. 5a), we assayed 
whether light (previously triggering strong appetitive responses) was 
being sensed and reported as sweet (now a no-go response). Animals 
were tested with 50 randomized trials consisting of 20 bitter, 10 sweet, 
10 low salt, and 10 low salt linked to light stimulation of the sweet cor- 
tical field. Our results (Fig. 5b) showed that light stimulation of sweet 
cortex was indeed being sensed as a ‘fictive’ sweet stimulus, eliciting 
strong and reliable no-go responses; Extended Data Fig. 7 shows similar 
experiments and equivalent findings with bitter cortex. Taken together, 
these results show that activation of a taste cortical field recapitulates 
an internal representation (for example, perceptual quality) naturally 
indicative of the orally presented chemical. 

The essential role of the sense of taste is to evaluate the quality of 
a food source or a meal, and to activate the appropriate behavioural 
actions to consume or reject ingestion!. The taste cortex is thought to 
represent the basic sensory features of the different taste qualities’””?, 
and to function as a central neural ‘hub’ that informs and integrates 
with other brain areas, and the internal state, to guide taste-dependent 
actions. 

This work centred on the study of the two most distinctive taste 
qualities, sweet and bitter. These two differ not only in quality but 
also in valence, mediating innately attractive and aversive behaviours. 
Many studies have used optogenetics to activate ensembles of neurons 
and examine their physiological and behavioural consequences®**~””. 
In this work we explored the internal representation of arguably the 
two most recognizable chemosensory percepts. Our current studies 
demonstrate that it is possible to govern an animal's perception and 
behavioural responses by direct manipulation of selective taste cortical 
fields. Notably, unlike our other fundamental chemical sense (smell), 
activation of the sweet and bitter cortical fields evokes predetermined 
behavioural programs, independent of learning and experience, further 
illustrating the hardwired and innate nature of the sense of taste. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Stereotaxic injections and anatomy. All procedures were performed according to 
the approved protocols at Columbia University. Six- to eight-week-old C57BL6/J 
and Trpm5~‘~ mice were used for viral injections. All surgeries were performed 
using aseptic technique. Mice were anaesthetized with ketamine and xylazine 
(100 mg/kg body weight and 10 mg/kg body weight, intraperitoneal), placed into 
a stereotaxic frame, and unilaterally injected with ~30 nl AAV carrying ChR2 
(AAV9.CamKIla.hChR2(H134R)-EYFP.WPRE.SV40, Penn Vector Core) either 
in the sweet cortical field (bregma 1.6 mm; lateral 3.1 mm; ventral 1.8 mm), or 
the bitter cortical field (bregma —0.3 mm; lateral 4.2 mm; ventral 2.8 mm). After 
viral injection, a guide cannula (26 gauge, PlasticsOne) or a customized implant- 
able fibre (200 j1m, numerical aperture = 0.39) was implanted 300-500 1m above 
the injection site, and fixed in place with dental cement. A metal head-post was 
also attached and secured with dental cement for the purpose of head fixation 
during behavioural experiments. For pharmacological experiments, AAV-ChR2 
was injected bilaterally in the sweet or bitter cortical fields, followed by bilateral 
implantation of guide cannulae. Mice were allowed to recover for 2-3 weeks before 
the start of behavioural experiments. Placements of viral injections, guide can- 
nulae, and implantable fibres were histologically verified at the termination of 
the experiments by TO-PRO3 (1:1,000, Invitrogen) staining of coronal sections 
(100 1m). Fluorescent images were acquired using a confocal microscope (FV 1000, 
Olympus). 

Animals. All behavioural experiments with wild-type animals used 6- to 8-week- 
old male C57BL6/J mice. No statistical methods were used to predetermine sample 
size, and investigators were not blinded to group allocation. No method of random- 
ization was used to determine how animals were allocated to experimental groups. 
In vivo recordings. Mice expressing ChR2 in taste cortex were anaesthetized with 
urethane (1.8 mg/g body weight), and the insular cortex was exposed as previously 
described‘, Extracellular neural activity was recorded using a tungsten electrode 
(resistance 2.0-4.0 MQ, FHC). Data were acquired, amplified, digitized, and 
bandpass filtered at 600-6,000 Hz with a Neuralynx data acquisition system. For 
photostimulation, 10 Hz, 5-ms pulses of 473 nm light (~5 mW) were delivered via 
a solid-state laser (Shanghai Laser & Optics Century Co.) coupled to an optical 
fibre (200|1m) positioned above the insular cortex. 

c-Fos induction and Immunohistochemistry. Individual mice were implanted 
with an intraoral cannula”® 3 days before c-Fos induction. On the day of experi- 
ments, mice were anaesthetized with urethane (1.6 mg/g body weight) and the 
trachea was cannulated to aid breathing during oral stimulus presentation. Tastants 
were perfused into the mouth through the intraoral cannula for 1.5h at a rate of 
~6mlh7!. Mice were allowed to rest for 30 min and processed for immunostain- 
ing as previously described. The brains were sectioned coronally at 100j1m, and 
labelled with goat anti-c-Fos (Santa Cruz, sc-52-G) overnight; Alexa 488 donkey 
anti-goat or cy3 donkey anti-goat (Jackson immunoResearch) were used to visu- 
alize c-Fos expression. All images were taken using an Olympus FluoView 1000 
confocal microscope. 

Place preference assays. Individual mice were tested in a custom-built two- 
chamber arena (30cm x 30cm total size). To differentiate the chambers, one cham- 
ber was designed with alternating black and white vertical stripes on its walls, 
whereas the other chamber was uniformly black. The arena was contained within 
a sound-attenuating cubicle (Med Associates). Mice were trained in the arena for 
30 min with photostimulation of the sweet or bitter cortical field, and tested in the 
absence of any light stimulation for 5 min at the end of each session (defined as 
‘preference test’). Animal locations were tracked in real time by video imaging. At 
the beginning of the experiments, mice were acclimated to the arena for one session 
without light stimulation (defined as the pre-test condition). Photostimulation 
sessions began the next day, with two daily sessions for about 1 week. For each 
mouse, one chamber was randomly selected for photostimulation (chamber 1); 
when a mouse was located in this chamber, light was delivered (20 Hz, 20-ms 
pulses, 5-10 mW) for 5-s intervals, with 5-s rest periods to avoid over-stimulation 
or phototoxicity. After 1 week of sessions, a ‘reverse probe’ study was performed in 
a subset of animals, during which photostimulation was delivered in the opposing 
chamber (chamber 2). Animals were trained for a minimum of eight sessions, and 
the preference tests from the last three sessions were used to calculate the prefer- 
ence index (PI); PI= (tf; — t2)/(ti + b), where t; is the fractional time a mouse spent 
in the chamber 1, and tf, is the time spent in chamber 2. 

Lick preference assays. Mice were first water-deprived for 24h to motivate drink- 
ing behaviour. They were then introduced to head restraint and acclimated to 
drinking from a motor-positioned spout in 60-trial sessions (15 min), twice a day 
for 3 days. Each trial began with a flash, followed 1s later by the spout swing- 
ing into position and a tone (4 kHz) to indicate the onset of water delivery. The 
spout remained in position for 5s and was then removed. Mice were weighed daily 


during the habituation period as well as during any behavioural tests requiring 
water restriction. Additional water was supplied as necessary to ensure that ani- 
mals maintained at least 85% of their initial body weight. To measure attractive/ 
appetitive responses, mice were mildly water restrained (exhibiting an average 
of not more than 15 licks per 5-s trial in the lick preference assay), and supplied 
with approximately 511 water during each trial. To measure aversion, mice were 
water-deprived for 24h, and supplied with approximately 101] water distributed 
over the full 5s of spout presentation for each trial (so that animals remained 
eager to lick for all 5s). To ensure animals were appropriately motivated in the 
lick preference behavioural assays (that is, thirsty to examine lick suppression, 
and mildly satiated to examine attraction), we examined animals exhibiting an 
average of at least 20 licks per 5s trial as an indicator of ‘thirst, and not more than 
15 licks per 5s trial for mild satiation. Animals were recorded by video for the 
entire session, and licks were analysed and counted by custom-written MATLAB 
software (Mathworks). Light stimulation and water delivery were controlled by 
the same software via an Arduino board. All animals analysed in these studies had 
histologically confirmed expression of ChR2 in the sweet or bitter cortical fields 
(Supplementary Table 1). 

Go/no-go taste discrimination behaviour. Mice deprived of water for 24h were 
first acclimated to consuming water in a head-restrained position for 15-min 
sessions over 2-3 days. Animals were then trained to perform a taste discrimi- 
nation task, in which they were to lick, and receive a water reward, in response 
to a 2-11 presentation of tastant-1 (‘go’) and to withhold licking in response to 
tastant-2 (‘no-go’). The presentation of the go and no-go stimuli was randomized. 
Each trial began with a visual cue (100-ms light flash), followed 1s later by a tone 
(4kHz, 300 ms) alerting the animal to sample the test tastant (for example, AceK or 
quinine; ~2 1] per sample). After sampling, mice were given 3s either to continue 
to lick the spout (go trial) or to withhold licking (no-go trials). On go trials, ifa 
mouse chose to lick within the 3-s interval, it was then rewarded with water for 
3s. On no-go trials, if a mouse failed to withhold licking within the 3-s interval, 
it was given a penalty of a gentle air puff to the eyelid. Mice were trained for two 
sessions per day, with 80 randomized trials (20 min) per session. For analysis, a ‘go 
response was defined as four or more licks in the second before reward or penalty. 
For photostimulation experiments, mice were first trained until they could effec- 
tively discriminate the tastants with ~90% accuracy (over 1-2 weeks). Then, on the 
‘probe’ sessions, tastants and/or cortical photostimulation were presented during 
the sample period. Neither reward nor punishment was delivered for novel tastants 
or light stimulation. Before testing, animals with correctly placed cannulae were 
provisionally identified by ChR2 expression followed by one or two sessions of lick 
preference pre-tests. All animals analysed had histologically confirmed placement 
of cannulae and expression of ChR2 in the appropriate cortical field. 
Pharmacological inhibition. Mice were trained to discriminate sweet from bitter 
in the go/no-go task with at least 90% accuracy. On the day of the experiment, 
mice were first tested with four taste stimuli (pre-test), including the original 
training tastants (2mM AceK and 0.1 mM quinine) and a novel sweet and bitter 
tastant (50mM sucrose and 21M cycloheximide). After the test, 0.3 11 of the gluta- 
mate receptor antagonist NBQX (5 mg ml~! in 0.9% NaCl, Tocris Bioscience) was 
bilaterally infused into the chosen insular cortical fields over a period of 3 min. 
NBQX was delivered via an internal infusion needle inserted into the same guide 
cannulae used for light stimulation and connected to a 1-1] Hamilton syringe 
(PlasticsOne). Saline (0.9% NaCl) was used as control. After NBQX or saline 
infusion, animals were placed in their home cages to rest for 1.5h. Mice were then 
re-tested with the same four taste stimuli on the go/no-go task (NBQX-test) and 
then at 8-24h after rest (recovery-test). During tests, a water reward was given 
for correctly identifying the go cue, but no air puff was delivered for incorrectly 
identifying the no-go cue (to avoid possible re-learning). No reward or punish- 
ment was applied for the novel sweet and bitter tastants. A performance ratio 
was calculated for each taste quality: ratio= 1)/r2, where rj is the percentage of 
correct responses during the NBQX-test or recovery-test, and 1 is the percentage 
of correct responses during the pre-test. The percentage of correct responses for 
each taste quality was the average of go (%) for go taste stimuli (for example, 
quinine and cycloheximide), or the difference between (100 — go (%)) for no-go 
stimuli (for example, AceK and sucrose). All animals analysed had anatomically 
confirmed placement of cannulae in the appropriate cortical field. We note that we 
made several unsuccessful attempts to optogenetically silence the sweet and bitter 
cortical fields; this may have been due in part to the requirement for expression 
in most, if not all, relevant neurons. 


28. Tokita, K., Armstrong, W. E., St John, S. J. & Boughter, J. D. Jr. Activation of 
lateral hypothalamus-projecting parabrachial neurons by intraorally delivered 
gustatory stimuli. Front. Neural Circuits 8, 86 (2014). 
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Extended Data Figure 1 | Expression of ChR2 in taste cortex. a, Samples 
of injection sites in the bitter and sweet cortical fields; shown are coronal 
sections (Fig. 1a shows a whole mount brain). ChR2-YFP expression 
(green), nuclei (blue; TO-PRO-3); numbers indicate position relative to 
bregma, and the dotted area highlight the location of the taste cortical 
fields (see c). b, Activation of insular neurons in sweet cortex triggers 
robust c-Fos expression; ChR2-YFP (green), c-Fos (red) after 10 min of 
in vivo photostimulation at 20 Hz, 20-ms pulses (5s laser on, 5s laser off, 

5 mW). Dashed lines indicate the location of the stimulating cannulae/fibre. 


Bregma 


+0.7 +1.5 

c, c-Fos (red) expression in bitter cortex (bregma 0, —0.2) after bitter 
tastant stimulation (10 mM quinine; see Methods for details). Note the 
absence of c-Fos expression in the middle (bregma +0.7) and sweet insular 
cortex (bregma +1.5). Importantly, specific labelling is abolished in taste 
blind animals (TRPM5 knockouts; middle row). The bottom row shows a 
diagram of the corresponding brain areas, adapted from the Allen Brain 
Atlas. Scale bars: 1 mm (a), 500 1m (b), 300 1m (c). PIR, piriform cortex; 
IC, insular cortex. 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a e Sweet cortex 
e Bitter cortex 


x< 
oO) 
xe) 
— 
® 
Oo 
Cc 
= 
£ 
2 
Oo 
0 2 4 6 8 10 
Session# 
I 
I x< 
' ® 0.4 
2 
I Oe 
7) O 
=| < 0.0 
—_ 
= 
QO! @ -0.2 
O1 o 
= 
& -0.4 
I 
QL, 
] Te. Che " 
Oe, 
Seles Chamber 2 7 
Extended Data Figure 2 | Acquisition of Place preference. a, The used in Fig. 1. Values are mean +s.e.m. b, Representative mouse track and 
development of ‘place preference’ as a function of session number (each quantitation of preference index in control GFP-expressing mice; note no 
session was 30 min of training and 5 min of ‘after-training’ testing in the difference in preference between chambers (nm = 14; Mann-Whitney 
absence of light stimulation; n = 13 for sweet cortex, n= 15 for bitter U-test, P=0.74). Values are mean + s.e.m. 


cortex; see text and Methods for details). The average of sessions 6-8 was 
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Extended Data Figure 3 | Photostimulation of insular cortical fields 
overcomes natural taste valence. a, Quantitation of licking responses 
in mice expressing ChR2 in the bitter cortical fields (n = 13, analysis of 
variance (ANOVA) test, Tukey’s honest significant difference post hoc 
test). Photostimulation of the bitter cortical fields significantly suppress 
the natural attraction of the sweet tastant (4mM AceK). b, Quantitation 
of licking responses in mice expressing ChR2 in the sweet cortical fields 
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(n= 14, ANOVA test, Tukey’s honest significant difference post hoc test). 
Photostimulation of the sweet cortical fields significantly overcomes the 
natural aversion of the bitter tastant (1 mM quinine). In both experiments, 
mice were water-restrained (but exhibited an average of not more than 

30 licks per 5-s water trial) such that they were motivated to drink the 
bitter while showing attraction to sweet. Values are mean +s.e.m. 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


TRPM5~ 


Licks/5s 
NO oO & 
(oe) (oe) ie) 


=" 
oO 


Extended Data Figure 4 | TRPM5 knockout mice do not taste sweet observed between water and sweet/bitter tastants in TRPM5 knockouts 
and bitter. Taste preference was tested in the head-restrained assay for (ANOVA test, P= 0.62, n= 10; see ref. 10 for more details); circles indicate 
wild type and TRPM5 homozygous mutants. Tastants were randomly individual animals; bar graphs show mean +s.e.m. 


delivered for a 5-s window (ten trials each). No significant difference was 
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Extended Data Figure 5 | Inactivation of the bitter cortical fields in (Mann-Whitney U-test, P<0.005). b, Quantitation of performance 
animals trained to go to bitter and no-go to sweet. a, Quantitation of ratios with saline (0.9%) control in the bitter cortical fields (n=6, 
performance ratios before and after bilateral silencing of the bitter cortical | Mann-Whitney U-test, P= 0.56). In both experiments, mice were trained 
fields (NBQX, 5 mg ml~!; n=7) in animals trained to go to bitter and with quinine and AceK, and tested with two pairs of sweet/bitter tastants 
no-go to sweet. Note the impact in bitter taste discrimination, but no (0.1 mM quinine and 2mM AceK, 2 uM cycloheximide and 50 mM 
significant effect in sweet taste (Mann-Whitney U-test, P<0.002). After sucrose; see Methods for details). 


washout of the drug, the animal’s ability to recognize bitter is restored 
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Extended Data Figure 6 | Sweet and low salt are appetitive tastants. strong aversion to bitter (n = 11, ANOVA test, Tukey’s honest significant 
Taste preference was tested during a 10-min window using the head- difference post hoc test); circles indicate individual animals; bar graphs 
restrained assay (see Methods for details). Four tastants were randomly show mean + s.e.m. These conditions were used in the experiments 
delivered to animals for 5s each (ten trials per tastant). Note that animals described in Fig. 5 and Extended Data Fig. 7. 


show significant attraction to sweet (AceK) and low salt (NaCl), but 
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to bitter (0.5 mM quinine) and no-go to sweet (4mM AceK) and low salt 
(20 mM NaC]l). b, Quantitation of the responses from individual animals 
to quinine, AceK, salt and salt + light (n = 8, Mann-Whitney U-test, 
P<0.002). See also Fig. 4. 


Extended Data Figure 7 | Cross-generalization between orally supplied 

taste stimuli and photostimulation of the bitter cortex. a, Representative 
histograms illustrating cross-generalization between taste stimulation and 
photostimulation of the bitter cortical field. The mouse was trained to go 
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Drosophila Ilonotropic Receptor 25a mediates 
circadian clock resetting by temperature 


Chenghao Chen!*, Edgar Buhl**, Min Xu!, Vincent Croset*, Johanna S. Rees‘, Kathryn S. Lilley*, Richard Benton’, 


James J. L. Hodge? & Ralf Stanewsky! 


Circadian clocks are endogenous timers adjusting behaviour and 
physiology with the solar day’. Synchronized circadian clocks 
improve fitness? and are crucial for our physical and mental well- 
being®. Visual and non-visual photoreceptors are responsible for 
synchronizing circadian clocks to light*”, but clock-resetting is 
also achieved by alternating day and night temperatures with only 
2-4°C difference®*. This temperature sensitivity is remarkable 
considering that the circadian clock period (~24h) is largely 
independent of surrounding ambient temperatures!*, Here we show 
that Drosophila Ionotropic Receptor 25a (IR25a) is required for 
behavioural synchronization to low-amplitude temperature cycles. 
This channel is expressed in sensory neurons of internal stretch 
receptors previously implicated in temperature synchronization 
of the circadian clock’. IR25a is required for temperature- 
synchronized clock protein oscillations in subsets of central clock 
neurons. Extracellular leg nerve recordings reveal temperature- 
and IR25a-dependent sensory responses, and IR25a misexpression 
confers temperature-dependent firing of heterologous neurons. We 
propose that IR25a is part of an input pathway to the circadian clock 
that detects small temperature differences. This pathway operates 
in the absence of known ‘hot’ and ‘cold’ sensors in the Drosophila 
antenna!®"" revealing the existence of novel periphery-to-brain 
temperature signalling channels. 

In Drosophila, daily activity rhythms are controlled by a network 
of ~150 clock neurons expressing the clock genes period (per) and 
timeless (tim). These encode repressor proteins that negatively feed- 
back on their own promoters resulting in 24h oscillations of clock 
molecules. Temperature cycles (TC) synchronize molecular clocks 
present in peripheral appendages in a tissue-autonomous manner®”?, 
whereas synchronization of clock neurons in the brain mainly depends 
on peripheral temperature receptors located in the chordotonal organs 
(ChO) and the ChO-expressed gene nocte”!*, 

To discover novel factors involved in temperature entrainment, we 
identified NOCTE-interacting proteins by co-immunoprecipitation 
and mass-spectrometry (Extended Data Table 1)'*. We focused on 
IR25a, a member of a divergent subfamily of ionotropic glutamate 
receptors and verified the interaction by co-immunoprecipitation 
after overexpressing IR25a and NOCTE in all clock cells using tim-gal4 
(Extended Data Fig. 1a). IR25a is expressed in different populations 
of sensory neurons, including those in the antenna and labellum!>~!’. 
In the olfactory system IR25a acts as a co-receptor with different 
odour-sensing IRs"°. 

To investigate if IR25a is co-expressed with nocte in ChO, we ana- 
lysed IR25a expression in femur and antennal ChO using an IR25a- 
gal4 line!> (Extended Data Fig. 2a). IR25a-gal4-driven mCD8-GFP 
labelled subsets of ChO neurons in the femur, overlapping substan- 
tially with nompC-QF driven QUAS-Tomato signals (using the QF 
binary transcriptional activation system) (Fig. la—c). nompC-QF is 


expressed in larval ChO!8 and in the adult femur ChO (Fig. 1d, e). 
Comparison of IR25a-driven mCD8-GFP and nuclear DsRed sig- 
nals with those of other ChO neuron drivers (F-gal4 and nocte-gal4 
(ref. 9)) suggests that IR25a is expressed ina subset of femur ChO neurons 
and Johnston’s Organ (JO) neurons (Fig. 1c and Extended Data 
Fig. 1b-g). To determine if IR25a-gal4 ChO signals reflect endoge- 
nous IR25a expression, we confirmed the presence of IR25a mRNA in 
the femur and leg (Extended Data Fig. 2b, e) and the co-localization 
of anti-IR25a immunofluorescence signals in femur ChO neurons 
(Fig. 1f, g). IR25a was detected in ChO neuron cell bodies and cili- 
ated dendrites, as was an mCherry-IR25a fusion protein expressed 
in these cells (Fig. 1h). 

As nocte' mutants do not synchronize to 12h:12h 16 °C:25°C 
temperature cycles in constant light (LL)? (Extended Data Fig. 3a), 
we analysed IR25a /~ mutants’© under these conditions. Unlike 
nocte!, the IR25a~'~ flies synchronized well to this regime and we 
obtained similar results at warmer temperature cycles (Extended Data 
Fig. 3a). To test whether IR25a is specifically required for synchro- 
nization to small temperature intervals”!>, we subjected IR25a7!— 
flies to various temperature cycles with an amplitude of only 2°C. 
Surprisingly, and in contrast to wild-type, IR25a_‘~ mutants did not 
synchronize to any of the shallow temperature cycles in LL or con- 
stant darkness (DD) (Fig. 2a-e and Extended Data Figs 3b and 4c). 
In LL, wild-type and IR25a rescue flies showed a clear activity peak 
in the second part of the warm period before and after the 6h shift 
of the temperature cycle. By contrast, IR25a_‘~ mutants were con- 
stantly active throughout the temperature cycle, apart from a short 
period of reduced activity at the beginning of the warm phase of TC1 
(Fig. 2a and Extended Data Fig. 3b). In DD, control flies slowly 
advanced (or delayed) their evening activity peak during phase-ad- 
vanced (or delayed) temperature cycles (Fig. 2b and Extended Data 
Fig. 4c). The phase of this activity peak was maintained in the subse- 
quent free-running conditions (DD, constant 25 °C) indicating stable 
re-entrainment of the circadian clock (Fig. 2b and Extended Data 
Fig. 4). By contrast, JR25a mutants did not shift their evening peak 
during the temperature cycle, keeping their original phase throughout 
the experiment (Fig. 2b and Extended Data Fig. 4c). 

To quantify entrainment in LL, we determined the ‘entrainment 
index’ (EI), whereas for most DD experiments we calculated the 
phase difference of the main activity peak upon release into constant 
conditions between IR25a mutants and controls. In all 2°C amplitude 
temperature cycles tested the entrainment index of IR25a~/~ flies was 
significantly lower and phase calculation indicated no phase shift or 
a significantly reduced phase shift compared to controls (Fig. 2c-e). 
The same non-synchronization phenotype was observed in IR25a~/ 
Df(IR25a) flies, and temperature synchronization was fully restored in 
IR25a ‘~ rescue flies (Fig. 2a—d and Extended Data Fig. 3b). IR25a/~ 
mutants synchronize to light and have normal free-running and 
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F >mCD8-GFP; Figure 1 | IR25a is expressed in ChO neurons. 
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a, Overview of the femur ChO adapted from). 

b, d, Double labelling of the femur ChO by IR25a- 
gal4 (b) and F-gal4 (d) driven mCD8-GFP and 
nompC-QF driven QUAS- Tomato. c, e, Higher 
magnification of circled ChO areas in b and d, 
respectively. f, IR25a immunolabelling of femoral 
ChO cryosections of [R25a-gal4/UAS-mCD8-GFP 
flies. From left to right, GFP, anti-IR25a, 22C10, 
and merged images are shown. g, Anti-IR25a 

and 22C10 labelling of femur ChO sections of 
IR25a~‘~ flies. h, Subcellular distribution of an 
mCherry-IR25a fusion protein co-labelled with 
the dendritic cap marker nompA-GFP in the 
femur ChO. Scale bar, 20|1m. 


IR25a7/- 


nompA-GFP, IR25a > mCherry-IR25a 


nompA-GFP 


temperature compensated periods (Fig. 2b, Extended Data Fig. 4d and 
Extended Data Table 2). These results suggest that IR25a enables the 
circadian clock to sense subtle temperature changes across the entire 
physiological range, rather than mediating synchronization to a specific 
range. Increasing the temperature cycle amplitude to 4°C consistently 
restored temperature entrainment in IR25a~'~ flies (Extended Data 
Fig. 4a, b). 

Temperature receptors located in fly antennae and arista are not 
required for temperature-synchronized behaviour®!"!”. As expected, 
we found that antennal [R25a function (Extended Data Figs 1c 
and 2a)!* is not required for temperature entrainment (Extended Data 
Fig. 5). To reveal the importance of IR25a expression in ChO neurons, 
we performed tissue-specific IR25a RNA interference (RNAi) using 
validated transgenes (Extended Data Figs 2d and 6a). IR25a RNAi in 
all or subsets of ChO neurons (Fig. 1 and Extended Data Fig. 1) resulted 
in a lack of entrainment (Extended Data Figs 2e and 6b, c). By contrast, 
IR25a RNAi in multidendritic, TRPA1-expressing or clock neurons did 
not impair temperature entrainment (Extended Data Fig. 6c). These 
findings are consistent with the absence of IR25a expression in clock 
neurons and the brain (Extended Data Fig. 2e-g) and show that IR25a 
functions in ChO neurons for temperature entrainment to 25 °C:27 °C 
temperature cycles in LL. 

To identify the neural substrates underlying the lack of behavioural 
synchronization, we quantified clock protein levels in wild-type, 
IR25a~'~, and IR25a~’~ rescue flies exposed to a shallow tempera- 
ture cycle in LL. Although TIM expression was robustly rhythmic and 
synchronized in all clock neuronal groups in controls, TIM was barely 
detectable in the Dorsal Neuron 1 (DN1) and DN2 of IR25a~'~ flies 
(Fig. 3a and Extended Data Fig. 7a, b). Moreover, in the small and large 
ventral lateral neurons (s-LNv and 1-LNv), TIM expression exhibited 
an additional peak during the warm phase (Fig. 3a and Extended Data 
Fig. 7a, b). In the DN3, TIM declined earlier compared to controls and 


mCherry-IR25a 


& 
4 Cilia 


Dendritic 
KK cap 


there was no effect on the dorsal lateral neurons (LNd). In temperature 
cycles and DD, TIM levels in DN1 were also blunted but oscillations 
in the DN2 and DN3 were similar to controls. In contrast to LL, TIM 
did not oscillate in any of the LN groups and was at constantly low 
levels (Fig. 3b), consistent with the behavioural results obtained under 
these conditions (Fig. 2b, d). The alterations of TIM expression are 
temperature specific, as we observed normal oscillations in LD cycles 
at 25°C (Extended Data Fig. 7c). An increase of the temperature cycle 
amplitude to 4°C also restored normal TIM expression in IR25a~/~ 
flies, in agreement with the behavioural rescue (Extended Data 
Figs 4a, b and 7d). In summary, in low-amplitude temperature cycles, 
IR25a is required for normally synchronized TIM oscillations in DN1-3 
and LNv in LL and in DN1 and LN clock neurons in DD. 

We tested if the clock neurons affected by the lack of IR25a are 
indeed involved in regulating behavioural synchronization to 
shallow temperature cycles by blocking synaptic transmission using 
tetanus-toxin (TNT). Indeed, TNT-expression in DN1 and DN2 
blocked synchronization in LL, whereas in DD only DN1 blockage 
interfered with temperature entrainment (Fig. 3c, d)?°. Consistent 
with the differential effect on TIM oscillations in LL and DD 
(Fig. 3a, b) these results strongly suggest that IR25a is required for the 
synchronized output of the DN1 (LL and DD) and DN2 (LL) to control 
temperature-entrained behaviour. 

Next, we asked if ChO might directly sense temperature in an 
IR25a-dependent manner. We recorded leg nerve activity in restrained 
preparations and identified ChO units in the compound signal 
(Fig. 4a). In both wild-type and IR25a~'~ flies, spontaneous leg 
movement changed as a function of temperature along with motor 
and sensory activity. Additionally, presumed ChO activity of wild- 
type flies also increased during periods without movement (Fig. 4b, 
third insert). This temperature-induced but movement-independent, 
ChO activity was absent in IR25a~‘~ flies, showing that temperature 
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Figure 2 | IR25a is required for temperature synchronization to 
low-amplitude temperature cycles. a, Upper part shows double plotted 
average actograms depicting the daily activity levels and environmental 
conditions during the entire experiment. White areas, LL and 25°C; orange 
areas, LL and 27°C. Histograms show daily average activity levels during 
the initial LL treatment and the last 3 days of each temperature cycle. 
Light orange, 25 °C; dark orange, 27 °C; white bars, activity levels in LL. 
Error bars indicate s.e.m.; numbers (7) in the upper-right corner; x axis, 
Zeitgeber time (h) and y axis total activity (beam crossings per 30 min). 
b, As in a but flies were initially kept in LD 25°C, before being exposed to 
a 7h phase advanced temperature cycle in DD (dark histogram bars) and 
free-running conditions (DD and 25°C). Actogram shading as in a but 
grey areas indicate darkness. Green and red arrows indicate the position 
(phase) of the main activity peak during the final free run for control and 
mutant flies, respectively. c, e, Entrainment index values (mean + s.e.m.) 
during 25°C:27 °C temperature cycles in LL (delay as in a) (c), and as 
indicated in e (all delay, except 25 °C:27 °C, advance) (see Extended Data 
Fig. 3b for actograms and daily average plots). In c, per” and nocte! 

flies were used as negative controls. ***P < 0.001, **P < 0.01, NS, not 
significant, one way ANOVA followed by Bonferroni correction. 

d, Phase difference during DD and constant temperature after temperature 
cycles between IR25a~/~ (n= 12/11/12 for 7h advance/8 h delay/8 h 
advance temperature cycles, respectively) and y w control (n= 16/10/14, 
respectively) and IR25a~'~ rescue flies (n= 16/18/12). ****P < 0.0001, 
*** DP < 0.001, **P < 0.01; F-statistic (Watson—Williams—Stevens test). 


is sensed in the legs in an IR25a-dependent manner (Fig. 4c). To test 
if IR25a contributes directly to temperature-sensing, we ectopically 
expressed this channel in the physiologically well-characterized, 
IR25a-negative, |-LNv (Extended Data Fig. 2f). As a positive control, 
we also expressed the temperature-sensitive Drosophila TRPA1 chan- 
nel”! in the ]-LNv. Isolated brains were exposed to a temperature ramp, 
and spike frequency of individual l-LNv was recorded. Control ]-LNv 
did not show a significant temperature-dependent change in neural 
activity (Fig. 4d). As expected, the firing rate of TRPA1 expressing 
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Figure 3 | IR25a is required for clock protein oscillations in central 
clock neurons. a, b, TIM levels in clock neurons during LL (a) and DD 
(b) 25°C:27°C temperature cycles at the indicated time points Zeitgeber 
time (ZT). At least 8 brain hemispheres per time point were analysed for 
each genotype. Error bars indicate s.e.m. c, Progeny of UAS-IMP-TNT 
and UAS-TNT females crossed to Clk4.1M-gal4 (DN1>, upper panel) or 
CIk9M-gal4; Pdf-gal80 (DN2>, lower panel) males, were exposed to two 
6 h-delayed temperature cycles (12h at 25°C:12h at 27°C in LL). Left, 
actograms, shading as in Fig. 2a. Right, entrainment index calculations 
(mean + s.e.m.), numbers in bars indicate n. **P < 0.01; One way 
ANOVA followed by Bonferroni correction. d, Same genotypes as in c 
were exposed to an 8 h delayed 12h 25°C:12h 27 °C temperature 

cycles in DD. Left: actograms plotted as in Fig. 2b, Right: phase difference 
of activity peaks during final constant conditions between controls 
(DN1/DN2 > UAS-IMP-TNT, n= 9/12, respectively) and the indicated 
genotypes (DN1/DN2 > TNTE, n= 16/10). ****P < 0.0001, NS, not 
significant, F-statistic (Watson-Williams-Stevens test). 


neurons drastically increased linearly with temperature, as did other 
cellular parameters (Extended Data Fig. 8). IR25a expression resulted 
in a linear and reversible temperature-dependent increase in action 
potential firing frequency (Fig. 4e, i), whereas other cellular parame- 
ters showed no difference (Fig. 4f-h). Increasing the temperature by 
only 2-3 °C also lead to a reversible increase in firing frequency of 
1.03 + 0.20 Hz in IR25a expressing 1-LNv (Fig. 4j). By contrast, expres- 
sion of the related, but olfactory-specific co-receptor IR8a (which is 
not required for temperature entrainment, Fig. 2c) did not confer 
temperature-sensitivity (Extended Data Fig. 8). These observations 
suggest that IR25a is at least part of a thermosensory receptor required 
for temperature entrainment. 

Our data indicating that IR25a contributes to temperature sens- 
ing within ChO extend the roles of IR’s beyond chemoreception, 
reminiscent of the requirement for the ‘gustatory receptor’ Gr28b in 
warmth-avoidance~’. Although we show that IR25a-expressing leg neu- 
rons are capable of sensing temperature and mediating temperature 
entrainment, it is possible that this receptor has a similar role else- 
where in the peripheral nervous system. IR25a responds to small tem- 
perature changes and we propose that the fly continuously integrates 
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Figure 4 | IR25a is required for temperature- 
induced leg nerve responses and confers 
temperature sensitivity to 1-LNv. a, Schematic of 
the setup. b, Recording of a control fly leg nerve 
including motor and sensory axons. The first 
extended insert shows a discharge of presumed 
ChO sensory units in response to manual 
extension of the tibia (green bars). Heating the 

| preparation from 20°C to 30°C (middle, red 

ww trace) lead to spontaneous leg movement with 
concurrent motor and sensory activity (second 
insert) but also to increased sensory firing in the 
absence of leg or motor activity (third insert), 
which was reversible with intact tibia extension 
response (fourth insert) (n =9). ¢, IR25a~!~ 
shows similar responses to tibia extension 

and temperature-dependent leg movement, 

but no sensory activity in response to elevated 
temperature (n =6). d, Whole-cell current clamp 
recordings of ]-LNv control and Pdf> IR25a 
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temperature signals received from multiple ChO across the whole body 
for synchronization of the clock. This potential reliance on weakly 
responding temperature receptors might explain why the Drosophila 
circadian clock is insensitive to brief temperature pulses”, which could 
help maintain synchronized clock function in natural conditions of 
rapid and large temperature fluctuations. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Plasmids and germline transformations. To generate the psp-flag-strep II-nocte-ha 
(FSNH) construct, a flag-strepII-venus-strepII (fsvs) fragment was amplified from a 
PiggyBac/P-element YFP-flag-strep II construct of'* using a Phusion High-Fidelity 
PCR kit (New England Biolabs). This 900 bp fragment was sub-cloned into psp73 
(Promega) to generate psp73-fsvs with BglII/Xholl sites. To introduce a strepII 
tag upstream of the NOCTE N terminus, a 0.5 kb fragment was amplified from 
psp73-nocte-HA (containing the entire nocte coding region fused to ha; Giesecke 
and Stanewsky, unpublished) by annealing the strep II tag directly using PCR. This 
fragment was religated back into psp73-nocte-ha to generate psp-strep I-nocte-ha. 
A 3x Flag tag was introduced 5! of psp73-strep I-nocte-ha by sub-cloning the 
strep I-nocte-ha fragment into psp-fsvs with BstBI/XholI sites replacing venus. 
To generate the psp-flag-strep I-nocte-strep II construct (FSNS), a Strep II tag was 
amplified and annealed 3’ of psp-nocte (Giesecke and Stanewsky, unpublished) to 
generate psp-nocte-strep II, followed by sub-cloning into psp-fsnh using BstBI/Xholl 
sites. FSNH and FSNS were sub-cloned into the transformation vector pUAST 
using BglII/Xholl sites, and transgenic flies were generated using classical transpo- 
sase-mediated germline transformation. To generate mCherry-IR25a, the coding 
sequence of IR25a lacking the endogenous signal sequence (starting from codon 
31) was PCR amplified, subcloned into pUAST-mCherry attB’, and integrated 
into attP2. To generate the IR25a genomic rescue construct, the bacterial artificial 
chromosome (BAC) CH322-32C20~ was integrated into attP16, and then recom- 
bined onto the JR25a” mutant chromosome. Restoration of IR25a expression by 
this BAC (in IR25a’, CH322-32C20/IR25a*, CH322-32C20 animals) was verified 
by immunostaining with anti-IR25a antibodies (data not shown). All constructs 
generated in this study were confirmed by DNA sequencing. 

Fly strains. Flies were kept at 25°C or 18°C on common cornmeal-yeast-sucrose 
food under light:dark cycles and 60-70 % humidity. As controls, wild-type Canton 
Sand y w flies (both carrying the Is-tim allele) were used. The following flies used 
in this study were previously described or obtained from the Bloomington Stock 
Center: tim-gal4:67 (ref. 25), Clock856-gal4 (ref. 26), F-gal4:33-5 (ref. 27), nocte-gal4 
(ref. 9) IR25a-gal4 (ref. 15), gmr-gal4 (BL1104), Pdf-gal4 (ref. 28), elav-gal4; UAS- 
dicer (BL25750), UAS-dicer (BL24646), trpA 1-gal4 (BL27593), nompC-gal4 (ref. 29), 
nompC-QF (BL36346), ppk-gal4 (BL32078), UAS-GFP (ref. 25), nompA-GFP™, 
UAS-mCherry (BL52268), QUAS-mtdTomato (BL30037), UAS-mCD8-GEP; UAS- 
DsRed?, Pdf-RFP*', UAS-TrpA1 (ref. 21)UAS-IR25a°, y per” w and y per’ w°?, 
IR25a_‘~: either homozygous IR25a? or IR25a7/IR25a' flies, both null mutant 
alleles generated by gene targeting!®, [R25a, CH322-32C20/IR25a’, CH322-32C20 
(ref. 15) (outcrossed to Canton S for six generations and here referred to as IR25a 
rescue), [R8a!: null allele of IR8a and referred to as IR8a~!~ !°, nocte!: encodes 
truncated version of the NOCTE protein?, UAS-TNT-E and UAS-IMP-TNT-V1-B 
(inactive). Clk4.1M-gal4 and Clk9M-gal4;Pdf-gal80 flies were used to direct GAL4 
expression to subsets of the DN1p and to the DN2, respectively”°*4. IR25a-RNAi 
lines 15627-R1 and 15627-R2 were obtained from the NIG-Fly Stock Center. wills, 
Df(2L)Exel6010/CyO was used as IR25a deficiency (BL7496). 

Immunostaining and quantification. GFP and/or RFP signals were analysed as 
described in’. Briefly, antennae and legs were fixed and dissected in 4% para- 
formaldehyde/PBS solution. Samples were then washed 3 times in 3% PBST at 
room temperature followed by mounting in Vectashield (Vector Labs) medium and 
inspected using a Leica TCS SP5 confocal microscope. To visualize endogenous 
IR25a expression in the ChO of fly antennae and legs, cryosections (16,1m) and 
immunolabelling were performed as described in*> with minor modifications: 
sections were collected on slides and fixed for 10 min in 4 % formaldehyde in PBS. 
After washing for 2 x 10 min in PBS, sections were treated for 30 min in PBS +- 0.1% 
Triton X-100 (PBT) and incubated in 5% normal goat serum (NGS) for 30 min. 
Primary antibodies (rabbit anti-IR25a 1:500 (ref. 16), mouse anti-22C10 1:200, 
DSHB) were diluted in PBT with NGS and applied to slides placed horizontally 
in humidified chambers and left for 2h at room temperature followed by incuba- 
tion overnight at 4°C. After washing for 3 x 10 min in PBT, slides were blocked 
in PBT with NGS for 30 min and incubated with secondary antibodies (rabbit 
AlexaFlour-594, 1:500, Mouse AlexaFluor-647 1:500, Invitrogen) diluted in PBT 
in the dark for 4h at room temperature. Slides were washed 3 x 5 min in PBS and 
mounted in Vectashield before observation. Immunostaining of whole-mounted 
brains was performed as described in*° with minor modifications. For LD experi- 
ments, flies were fixed on the fifth day of light entrainment. For temperature exper- 
iments, flies were first reared in LL and 25°C for 3 days, and then transferred to a 
25°C:27°C or 25°C:29°C temperature cycles for 7 days. Temperature cycles were 
rectangular and not ramped. Therefore, the conditions are consistent with those 
for behavioural analysis. For temperature entrainment in DD, flies were initially 
entrained to LD at 25°C for 2 days followed by a 25 °C:27 °C temperature cycles in 
DD that was shifted 8h in advance with respect to the previous LD cycle. Brains 
were dissected on day 6 of the temperature cycles at the indicated time points. 
Primary rat anti-TIM (1:1,000)°”, and secondary rat AlexaFluor-594 antibodies 
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(Invitrogen, 1:500) were applied. Mounted brains were scanned using a Leica 
TCS SP5 confocal microscope. Quantification of TIM signals was performed as 
in** with minor modifications: Pixel intensity of stained neurons and background 
staining in each neuronal group was measured using ImageJ. Background signal 
was determined by taking the average signal of two surrounding fields of each neu- 
ronal group and was subtracted from the neuronal signal. For each group of clock 
neurons, at least 8 hemispheres from each genotype were checked and measured 
per time point. Data were normalized by setting the peak value to 1 and the ratio 
from each time point was then divided by the peak value. 
Co-immunoprecipitation. Co-immunoprecipitation experiments were performed 
as described". For each protein purification, 200-300 mg wet-weight of heads 
from gmr-gal4 flies expressing UAS-nocte-flag (FSNS and FSNH transgenics were 
used in 2 independent experiments) or gmr-gal4 alone (negative controls) were 
collected on dry ice and manually homogenized with a 2 ml Dounce homogenizer 
(Fisher) in 1 ml of extraction buffer (final protein concentration 5 mg ml! extrac- 
tion buffer) containing 50 mM Tris, pH 7.5, 125mM NaCl, 1.5mM MgCh, 1mM 
EDTA, 5% glycerol, 0.4% NP-40, and 0.1% Tween 20. To prevent degradation 
during the lengthy purification steps, 2x protease mini EDTA inhibitor mixture 
(Roche) was added at hourly intervals throughout the procedure. The homogenate 
was centrifuged at 10,000 r.p.m. for 15 min to isolate the soluble fraction used for 
pull-down. For the Flag pull-down procedure, EZview Red anti-Flag M2 affinity 
gel (Sigma) was used to bind the Flag-tagged bait and its bound partners. 50 11 pre- 
washed 50% slurry was added to 1 ml soluble protein and incubated at 4°C for 2h 
on a rotary mixer. Non-binding material was removed by centrifugation (8,000g 
for 2 min) and the resin was washed three times in ice-cold extraction buffer. For 
checking the interaction with IR25a (Extended Data Fig. 1a), the washed resin was 
directly boiled with 5x SDS loading buffer followed by routine western blot. For 
elution the isolated protein complexes, Flag-tagged protein with any associating 
proteins, was incubated and eluted three times each with 5011 (100 1g ml’) Flag 
peptide (Sigma) in extraction buffer for 30 min at 4°C on a rotary mixer. The 
three eluates were combined and any residual resin was removed by centrifugation 
at 8,000g for 2 min. The following mass spectrometry peptide sequencing was 
performed by Cambridge Centre for Proteomics. Briefly, eluates from the tagged 
line and untagged control flies, were processed as described", The only deviation 
from the method described was that peptides were applied to a 180,1m x 20mm 
(51m particle size) C18 trap column (Waters UPLC Trap Symmetry) coupled to 
a nanoAcquity UPLC system (Waters) using 0.1% formic acid in water (buffer A) 
at a flow rate of 10 11 min~!. Peptides were then separated on a 75\um x 250mm 
(1.7|.m particle size) reverse phase BEH C18 analytical nano-column (Waters) at a 
flow rate of 300nl min“! using a gradient of buffer A and buffer B (0.1% formic acid 
in acetonitrile). The HPLC system was directly coupled to a LTQ Orbitrap Velos 
(Thermo Scientific) with a New Objective nanospray ionisation source operated 
at a resolution of 60,000. Peptides were eluted with a linear gradient of 5-45% 
buffer B over 45 min or with a re-equilibration step, giving total running times of 
60 min. The Orbitrap analyser survey scan was performed over a mass range of 
m/z 380-1,500 each of them triggering 10 MS2 LTQ acquisitions of the ten most 
intense ions exceeding 500 counts using a data dependent acquisition mode. 
Western blot. For confirming the interaction between NOCTE and IR25a total 
head proteins were isolated from flies expressing IR25a, or IR25a and Flag-tagged 
NOCTE, under the control of tim-gal4. Boiled beads (after Co-IP) were loaded 
on SDS-PAGE gels, followed by standard western blot. Primary rabbit anti-IR25a 
1:5,000 (ref. 16) and mouse anti-Flag M2 1:1,000 (Sigma), and secondary HRP- 
conjugated goat anti-rabbit IgG-HRP (1:10,000) and goat anti-mouse IgG-HRP 
(1:1,000) antibodies (Jackson) were used. 

RNA isolation and RT-PCR. For RNA extractions, 30-50 flies were collected in 
2 ml RNAlater (Ambion) and kept at 4°C overnight, and 100 11 0.1% PBST was 
added to help RNAlater penetration. Femurs from around 200 fly legs and 50 
retinas were quickly dissected in cold RNAlater. Total RNA was extracted using an 
RNEasy kit (QIAGEN) according to the manufacturer’s instructions. Total RNA 
was finally eluted in RNase-free water and stored at —80°C. cDNA synthesis was 
performed with Reverse Transcription Reagents Kit (Applied Biosystems) in 1011 
reactions using 11g of total RNA according to the manufacturer's instructions. 
To verify mRNA expression level of IR25a and nocte in fly femur ChO, dilutions 
of cDNA were used for PCR with the following primers: rp49 and nocte (ref. 9), 
IR25a (ref. 39), followed by DNA electrophoresis on 2% agarose gels to visualize 
the PCR products. To test the efficiency of IR25a RNAi, 20 heads of 5-10-day-old 
flies were dissected and RNA was extracted and reverse transcribed as described 
above. Taqman probes for IR25a (catalogue number 4351372, ThermoFisher) and 
RPL32 (catalogue number 4448489, ThermoFisher) were applied to determine 
the amount of mRNA. For determining IR25a mRNA levels in different tissues, 
body parts were dissected in RNA later (Ambion). 1 jug RNA was used for cDNA 
synthesis. Real-time assays were performed using an ABI GeneAMP PCR system 
9700 using the standard program, and C;, (threshold cycle) values were applied to 
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determine the amount of RNA in each genotype. The relative concentrations were 
calculated using the 27 AAC method, and RPL32 was used as control. 
Behavioural analysis. Analysis of locomotor activity of 4-5-day-old male flies was 
performed using the Drosophila Activity Monitor System (DAM, Trikinetics). The 
DAM monitors, as well as an environmental monitor (Trikinetics), were located 
inside a light- and temperature-controlled incubator where the fly’s activity was 
monitored for a few weeks depending on different experimental conditions. 
Plotting of behavioural activity and period calculations were performed using a 
signal-processing tool-box*’ implemented in Matlab (MathWorks). In order to 
quantify behaviour during temperature cycles, an updated Histogram version 
based on Excel (Office, Microsoft) was applied**. Briefly, the activity from the 
last two days of each temperature cycle was plotted in Excel in 30-min bins and 
an ‘entrainment index’ (EI = ratio of activity occurring during the 6 h window 
covering the main activity peak of the positive controls over the activity during the 
entire warm phase) was calculated. To distinguish the clock-controlled behavioural 
peaks from temperature response peaks, a simple smoothing filter was applied for 
the four activity bins during the 2h following each temperature transition**. The 
filtered data was used for calculation, whereas the raw activity data are plotted 
in the histograms. The entrainment index values plotted in all histograms repre- 
sent the average entrainment index from the temperature cycles before and after 
the shift except for 18°C:20°C and 21°C:23 °C, where the entrainment index was 
generated from the temperature cycles after the shift. To calculate the phase of the 
main activity peaks after DD and temperature cycles involving genotypes that did 
not show clear activity peaks during entrainment, we employed circular phase 
plot analysis as previously described**". In brief, the mean activity phase of the 
three consecutive days after release into constant conditions was determined for 
each fly of the two genotypes to be compared. An average ‘vector’ indicating phase 
coherence (length) and mean peak phase (direction) is calculated for each genotype 
and the two vectors are compared by an F-statistic (Watson-Williams—Stevens 
test). The difference in direction is plotted in hours (h) and the mean peak phase 
of the controls (negative or positive controls, depending on the experiment) was 
set to zero. 

Electrophysiology. Extracellular leg nerve recording. The question of whether ChO 
in the legs would respond to temperature changes was examined using extracellular 
recordings in restrained intact leg preparations. Canton S w* flies were used 
as a control and IR25a~/~ mutants were used to test if IR25a is involved in this 
process. In order to minimize locomotory artefacts, flies were decapitated and 
all legs but the left hind leg amputated. Flies were mounted ventral side up and 
pinned down in a Sylgard (Dow Corning, USA) coated recording chamber so that 
the left hind leg was orientated perpendicular to the body but not immobilized 
(Fig. 4a). Therefore the ChO could be stimulated in vivo by moving the tibia with 
a fine needle. A tungsten wire electrode, sharpened to a fine point, was inserted 
through the cuticle in the thorax served as a reference electrode and a similar 
recording electrode was placed in the coxa of the remaining leg. The final position 
of the recording electrode was determined by monitoring the signal it was record- 
ing and then manually extending the tibia until a response in sensory units was 
seen. The signal was amplified using a BioAmp extracellular amplifier, filtered (low 
5 kHz, high 10 Hz), digitized (sampling frequency 10 kHz) with a PowerLab 2/20 
and recorded using LabChart 7 (ADInstruments, Bella Vista, Australia). 
Whole-cell recordings. Different genotypes were used for each group and the data 
pooled as there were no differences between them: control, Pdf-RFP and Pdf-gal4; 
UAS-mCherry; IR25a, Pdf-gal4/UAS-IR25a; Pdf-RFP and Pdf-gal4/UAS-mCherry- 
IR25a; TrpA1, Pdf-gal4/UAS-TrpA 1; Pdf-RFP and Gal1118-gal4/UAS-TrpA 1/UAS- 
mCherry-IR8a; Pdf-gal4/UAS-IR8a; Pdf-RFP and Pdf-gal4/UAS-mCherry-IR8a. 
Experiments were performed under red light illumination and light exposure 
during dissection was kept to a minimum. For visualization of the l-LNv we used 
RFP-tagged constructs and a 555 nm LED light source in order to not activate 
cryptochrome. Adult flies raised in 12 h:12h LD at 25°C, were collected ~3-5 
days post eclosion between ZT13 and ZT16, decapitated and brains dissected in 
extracellular saline solution containing (in mM): 101 NaCl, 1 CaCh, 4 MgCh, 
3 KCI, 5 glucose, 1.25 NaH»POu, 20.7 NaHCOs, pH adjusted to 7.2. The brains 
were transferred for 5-10 min to saline containing 20 U per ml papain with 1mM 
L-cysteine to digest the ganglion sheath. After removal of the photoreceptors, air 
sacks and trachea, a small incision was made over the position of the |-LNv neu- 
rons in order to give easier access for the recording electrodes. Brains were placed 
ventral side up in the recording chamber, secured using a custom-made anchor and 
during recordings continuously perfused with aerated (95% O32, 5% COz) saline 
solution. I-LNv neurons were identified on the basis of their fluorescence, size and 
position. A single recording was performed from one l-LNv per brain. Whole-cell 
current clamp recordings were performed using glass electrodes with 10-20 MQ. 
resistance filled with intracellular solution (in mM: 102 K-gluconate, 17 NaCl, 0.94 
EGTA, 8.5 HEPES, 0.085 CaCh, 1.7 MgCl, pH 7.2) and an Axon MultiClamp 700B 


amplifier, digitized with an Axon DigiData 1440A (sampling rate: 20 kHz; filter: 
Bessel 10 kHz) and recorded using pClamp 10 (Molecular Devices, USA). A cell 
was included in the analysis if the access resistance was less than 70 MQ. and the 
leak current in response to a —40 mV pulse less than —100 pA. All chemicals were 
purchased from Sigma (Poole, UK). The liquid junction potential was calculated 
as 13 mV and was subtracted from all the membrane voltages. Resting membrane 
potential (Vm) was measured after stabilizing for 2-3 min. Membrane input resist- 
ance (Rin) was calculated by injecting hyperpolarizing current steps and measuring 
the resulting changes in voltage. Spike frequency was manually measured using 10s 
bins for each degree of temperature. To test the effect of elevated temperature, the 
recording chamber and the perfusion influx were gradually heated from 18°C to 
30°C within 5-10 min and cooled back to 18°C within 10-15 min using a Peltier 
heating system (ALA Scientific Instruments, USA) and TC-10 controller (npi, 
Tamm, Germany). The temperature coefficient Q10 was calculated by dividing the 
firing rate at 30°C by the rate at 20°C. To check whether ]-LNvs can also sense small 
temperature changes of 2-3 °C, neurons were recorded as before, the temperature 
increased to around 24.5 °C held for 3 min, then increased to 27.5°C, held again for 
3 min, cooled back down to 24.5 °C and recorded for a further 3 min. During the 
whole period the instantaneous spiking frequency was monitored. All values are 
given as mean and s.e.m. and a t-test and ANOVA (followed by Tukey test) were 
used to calculate significant differences. 

Data reporting. No statistical methods were used to predetermine sample size. 
The investigators were not blinded to allocation during experiments and outcome 
assessment. 


24. Venken, K. J. et a/. Versatile Placman] BAC libraries for transgenesis studies in 
Drosophila melanogaster. Nature Methods 6, 431-434 (2009). 

25. Kaneko, M. & Hall, J. C. Neuroanatomy of cells expressing clock genes in 
Drosophila: transgenic manipulation of the period and timeless genes to mark 
the perikarya of circadian pacemaker neurons and their projections. J. Comp. 
Neurol. 422, 66-94 (2000). 

26. Gummadova, J. O., Coutts, G. A. & Glossop, N. R. Analysis of the Drosophila 
Clock promoter reveals heterogeneity in expression between subgroups of 
central oscillator cells and identifies a novel enhancer region. J. Biol. Rhythms 
24, 353-367 (2009). 

27. Kim, J. et al. A TRPV family ion channel required for hearing in Drosophila. 
Nature 424, 81-84 (2003). 

28. Park, J. H. & Hall, J. C. Isolation and chronobiological analysis of a 
neuropeptide pigment-dispersing factor gene in Drosophila melanogaster. 

J. Biol. Rhythms 13, 219-228 (1998). 

29. Liu, L. et al. Drosophila hygrosensation requires the TRP channels water witch 
and nanchung. Nature 450, 294-298 (2007). 

30. Chung, Y. D., Zhu, J., Han, Y. & Kernan, M. J. nompA encodes a PNS-specific, 
ZP domain protein required to connect mechanosensory dendrites to sensory 
structures. Neuron 29, 415-428 (2001). 

31. Ruben, M., Drapeau, M. D., Mizrak, D. & Blau, J. A mechanism for circadian 
control of pacemaker neuron excitability. J. Biol. Rhythms 27, 353-364 (2012). 

32. Konopka, R. J. & Benzer, S. Clock mutants of Drosophila melanogaster. 

Proc. Nat! Acad. Sci. USA 68, 2112-2116 (1971). 

33. Sweeney, S. T., Broadie, K., Keane, J., Niemann, H. & O’Kane, C. J. Targeted 
expression of tetanus toxin light chain in Drosophila specifically eliminates 
synaptic transmission and causes behavioral defects. Neuron 14, 341-351 
(1995). 

34. Zhang, Y., Liu, Y., Bilodeau-Wentworth, D., Hardin, P. E. & Emery, P. Light and 
temperature control the contribution of specific DN1 neurons to Drosophila 
circadian behavior. Curr. Biol. 20, 600-605 (2010). 

35. Saina, M. & Benton, R. Visualizing olfactory receptor expression and 
localization in Drosophila. Methods Mol. Biol. 1003, 211-228 (2013). 

36. Yoshii, T., Todo, T., Wulbeck, C., Stanewsky, R. & Helfrich-Forster, C. 
Cryptochrome is present in the compound eyes and a subset of Drosophila’s 
clock neurons. J. Comp. Neurol. 508, 952-966 (2008). 

37. Rush, B. L., Murad, A. Emery, P. & Giebultowicz, J. M. Ectopic CRYPTOCHROME 
renders TIM light sensitive in the Drosophila ovary. J. Biol. Rhythms 21, 
272-278 (2006). 

38. Gentile, C., Sehadova, H., Simoni, A., Chen, C & Stanewsky, R. Cryptochrome 
antagonizes synchronization of Drosophila’s circadian clock to temperature 
cycles. Curr. Biol. 23, 185-195 (2013). 

39. Croset, V. et al. Ancient protostome origin of chemosensory ionotropic 
glutamate receptors and the evolution of insect taste and olfaction. PLoS 
Genet. 6, €1001064 (2010). 

40. Levine, J. D., Funes, P., Dowse, H. B. & Hall, J. C. Signal analysis of behavioral 
and molecular cycles. BMC Neurosci. 3, 1 (2002). 

41. Simoni, A. et al. A mechanosensory pathway to the Drosophila circadian clock. 
Science 343, 525-528 (2014). 

42. Wilson, R. |. & Corey, D. P. The force be with you: a mechanoreceptor channel 
in proprioception and touch. Neuron 67, 349-351 (2010). 

43. Stanewsky, R. et a/. Temporal and spatial expression patterns of transgenes 
containing increasing amounts of the Drosophila clock gene period and a lacZ 
reporter: mapping elements of the PER protein involved in circadian cycling. 
J. Neurosci. 17, 676-696 (1997). 


© 2015 Macmillan Publishers Limited. All rights reserved 


a 
tim > IR25a ie = 
tim > IR25a + FLAG Nocte - + 
anti-FLAG a ] 
ve 


Input 


IR25a > mCD8-GFP, Ds-Red-NLS 


Nocte 


IR25a 


LETTER 


+ = tim > IR25a 
- + tim > IR25a + FLAG Nocte 
rn = 120kDa 
IP: anti-FLAG 


F-Gal4 > mCD8-GFP, 
Ds-Red-NLS 


2nd antennal segment 
(Johnston’s Organ) 


Femur 


Extended Data Figure 1 | IR25a and Nocte physically interact in vivo 
and are expressed in femur and antennal ChO neurons. a, In vivo 


co-immunoprecipitation experiments using protein extracts from fly heads. 


Head lysates were immunoprecipitated using anti-Flag antibody. The 
immunoprecipitates were examined by western blotting using anti-Flag 
and anti-IR25a antibody. Input represents 30% of cell lysates used in the 
pull-down experiment. The genotypes of the flies used were: tim > IR25a: 
UAS-GFP/UAS-IR25a; tim-gal4:67/+-. tim > IR25a+-FLAG-NOCTE: 
UAS-FSNH/UAS-IR25a; tim-gal4:67/--. The bracket indicates that 
NOCTE-Flag runs as a double band on western blots. For uncropped gel 


images, see Supplementary Fig. 1. b, Overview of the antennal and 

femur ChO adapted from refs 13, 42. c, d, Labelling of the JO neurons 

by IR25a-gal4 (c) and F-gal4 (d) driven membrane bound mCD8-GFP 
and nuclear-localized DsRed expression. Note that IR25a is expressed in 
only a subset of JO neurons. e, f, Same flies as in c, d analysed for IR25a 
expression in the femoral ChO. Again, only subsets of the ChO neurons 
express IR25a. Arrows in c, e point to ChO neuron nuclei. g, Labelling of 
the ChO neurons in JO and femur by nocte-gal4-driven membrane-bound 
GFP and nuclear DsRed expression’. Scale bar, 201m. 
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Extended Data Figure 2 | Spatial and quantitative IR25a mRNA and 
protein expression in CNS and PNS tissues and efficiency of RNAi- 
mediated knockdown. a, Analysis of IR25a-gal4 and IR25a in the third 
segment of the antenna reveals expression in coeloconic sensilla’®. 
Schematic adapted from”. b, c, Determination of IR25a and nocte mRNA 
levels in femur and retinal tissues by semiquantitative RT-PCR; rp49 

was used as control. For uncropped gel data, see Supplementary Fig. 1. 

d, e, qPCR analysis of IR25a mRNA levels in whole heads (d), or dissected 
body parts (as indicated) (e) from flies of the genotypes indicated. 
Pan-neuronal elav-gal4 knockdown (d) decreased IR25a mRNA >75% 

or >90%, using one or two different RNAi lines combined, respectively. 
**E* P< 0.0001, ***P < 0.001, *P < 0.05, one-way ANOVA followed by 


Bonferroni correction. f, IR25a is not expressed in the central brain and 
clock neurons. Left, IR25a immunolabelling of a Canton S brain reveals no 
signals. Middle, same brain labelled with anti-TIM reveals expression 

in clock neurons. Right, merge. Brains were dissected in LD at ZT20. 

Scale bar, 10 jum. g, IR25a-gal4 is not expressed in clock neurons and largely 
absent from the brain. Left, nuclear DsRed driven by IR25a-gal4. Second 
from left, anti-PDF staining showing LNv and their projections. Middle, 
anti-PER (diluted 1:5,000)** showing all clock neurons. Second from right, 
merge, showing two IR25a-galé4 positive cell in the antennal lobe, not 
co-localized with any of the clock neurons. These cells were observed 

in 4/8 hemispheres and always on the same side of the brain. Right, 
magnified view of circled area in the merged image. Scale bar, 301m. 
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Extended Data Figure 3 | IR25a is required for temperature 
synchronization to low-amplitude temperature cycles but not for 
high-amplitude temperature cycles. a, Canton S, IR25a~/~, and nocte! 
flies were exposed to LD at 20°C for 5 days (left) or LL at 25°C for 2 days 
(right), followed by exposure to a 12 h:12 h 20°C:29°C (left) or 16 °C:25°C 
(right) temperature cycles in LL, which after 6-7 days was delayed or 
advanced by 6h, respectively. Warmer temperature indicated by red and 
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Canton S IR25a~” nocte' 


Canton S IR25a~’ rescue 


orange shading, respectively. b, Actograms and daily averages of Canton S 
and IR25a~/~, and IR25a~'~ flies containing a genomic IR25a rescue 
construct (rescue) exposed to 18°C:20°C temperature cycles in LL (left) 
and 21 °C:23°C temperature cycles in LL (right). Warm phase in actograms 
indicated by orange shading. Histogram colour coding as in Fig. 2. 

For quantification see Fig. 2e. 
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Extended Data Figure 4 | IR25a is not required for temperature indicated in the bars. b, As in a but flies were initially kept in LD and 
synchronization to high- but to low-amplitude temperature cycles DD for 2 days each (left) or LD (right), before being exposed to two 
and IR25a~'~ flies show normal LD and DD behaviour. a, Canton $ phase delayed (left) or advanced (right) temperature cycles in DD at the 
and IR25a~'~ flies were exposed to LL at 25°C for 2-3 days, followed by temperatures indicated. n.s., not significant. c, Behaviour of IR25a~/~ and 
exposure to high-amplitude 12 h:12h temperature cycles in LL, which rescue flies during DD and 25°C:27 °C temperature cycles with 8h delay 
after 5-6 days were delayed by 6h. Double plotted average actograms and during DD and 21 °C:23 °C temperature cycles with a 8h advance 
depicting the daily activity levels and environmental conditions during compared to the previous LD cycle (at 25°C). Warm phase is indicated 
the entire experiment are shown. Actual temperatures are colour coded by orange shading. d, Canton S and IR25a/~ flies during LD and DD 
and indicated below the entrainment index calculations. Numbers (1) conditions at 25°C (see Extended Data Table 2 for period calculations). 
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Extended Data Figure 5 | Antennal IR25a expression is not necessary averages as described before. b, Quantification of behaviour as described 
for synchronization of locomotor activity rhythms to temperature in Fig. 2. The data of IR25a~'~ with normal antennae was taken from 


cycles. Ablation of antennae as indicated. a, [R25a~'~ and rescue flies Fig. 2a. n.s., not significant. 
were exposed to the same condition used in Fig. 2. Actograms and daily 
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Extended Data Figure 6 | Knocking down IR25a expression via RNA for Fig. 2a. c, Progeny of the respective UAS-IR25a-RNAi lines crossed to 
interference disrupts synchronization of locomotor activity rhythms y w (left three columns) and flies from (a, b) and the other gal4 drivers 
to temperature cycles (25 °C:27 °C in LL). a, b, Behaviour of flies with indicated, were exposed to the same LL and temperature cycle conditions 
spatially restricted IR25a knockdown mediated by IR25a-gal4 (a), used in Fig. 2a. As controls, UAS-dicer, gal4 driver lines were crossed to 
ChO specific F-gal4, and nompC-gal4 (b) driven IR25-RNAi expression, y wand FI males containing UAS-dicer/Y and the respective gal4/+ were 
respectively. Control flies are UAS-dicer2/Y; IR25-gal4/+ (a), and tested. Numbers of analysed individuals (n) are indicated above each 
UAS-dicer2/Y;+/+; F-gal4/+ or UAS-dicer2/Y;+/+; nompC-gal4/+ (b). column. Entrainment was quantified as in Fig. 2c. ****P < 0.0001, 
Test flies carry the same transgenes, but in addition one or two copies of one-way ANOVA followed by Bonferroni correction. 


the IR25a-RNAi line indicated. Actograms and daily averages as described 
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Extended Data Figure 7 | Rescue of TIM oscillations in clock neurons 
during low-amplitude temperature cycles and normal TIM oscillations 
during LD and high-amplitude temperature cycles. a, b, TIM levels in 
clock neurons during LL 25 °C:27 °C temperature cycles at the indicated 
time points (ZT) in the genotypes indicated. At least 8 brain hemispheres 
per time point were analysed for each genotype. Scale bars, 10 jm. 
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Data in b are mean +s.e.m. c, Quantification of TIM levels in clock 
neurons during LD (25°C) in Canton S$ and IR25a~/~ mutant brains. 
d, TIM oscillations in different clock-neuronal groups in IR25a~/~ 
are restored in 25 °C:29°C temperature cycles in LL. At least 8 brain 
hemispheres per time point were analysed for each genotype and 
condition. Error bars indicate s.e.m. 
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Extended Data Figure 8 | Ectopic expression and heat responses of TRPA1 
and IR8a in 1-LNv clock neurons. a, Whole-cell current clamp recordings 
of Pdf-gal4/UAS-TrpA1; Pdf-RFP (top trace, red) and Pdf-gal4/UAS-IR8a- 
RFP (bottom trace, black) brains exposed to a temperature ramp from 
18°C to 30°C and back to 18°C. Note the additional depolarization of the 
TrpA1 neuron at higher temperatures. b, Compared to control (Fig. 4), 
recordings from TrpA 1-expressing neurons show a large increase in firing 
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rate with temperature which the IR8a expressing neurons do not. 

c, d, In comparison to control neurons (data taken from Fig. 4) the 
membrane potential of TrpA1 expressing neurons is more positive at 30°C 
(open bars) and the input resistance is also significantly reduced in TrpA1 
at 18°C. e, f, The firing rate at 18°C is higher for IR8a neurons but only the 
Q10 of TrpA1 is different to control. Bars are means and whiskers s.e.m., 

n indicated in bars, *P < 0.05, ***P < 0.001, ANOVA followed by Tukey test. 
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Extended Data Table 1 | Mass spectrometry data from fly heads from three different genotypes 


UNIQUE PEPTIDES % SEQ. MASCOT EMPAI PEPTIDE SEQUENCES 
COVERAGE SCORE SCORE 
FLYBASE ID GENE DESCRIPTION FSNS FSNH NEG FSNS FSNH NEG FSNS_FSNH FSNS FSNH FSNS FSNH 
FBpp0298823nocte no circadian 22 17° 410.87 8.79 1.99 1399.5 713 0.30 0.20LSASTTSWQR,LGYEEYGK,GASVGGS LGYEEYGK,GASVGGSSGYGR, 
temperature SGYGR,SISGGYVQR,QPVGTGSAGGS QPVGTGSAGGSGSGGSGR,QA 
entrainment, GSGGSGR,QDDIDFTK,QAQALPR,GYA QALPR,GYAGSSGGSSVGSGS 
isoform C GSSGGSSVGSGSSYR,GGVSGGGGGV SYR,RQPVGTGSAGGSGSGGS 


SGGAQANAGQGR,RQPVGTGSAGGS GR,GGVSGGGGGVSGGAQAN 

GSGGSGR,NASDWGSSR,TLTTDPMPT AGQGR, TLTTDPMPTGILR, TSE 
QILR, TSESETDLDKTK,QQQQQQQQLP SETDLDKTK,QQQQQQQQLPR, 
R,SPLSADMSLGLAK,KLQELEMK,SAS SPLSADMSLGLAK,KLQELEMK 
ASSAFDSNSR,EQAAAAAVAAQR,LGF ,SASASSAFDSNSR,EQAAAAA 
SFGDDPTTPLK,FTALDINR,KIESCAVV VAAQR,LGFSFGDDPTTPLK,FT 


GGEK,SASPAVVGSGSFR ALDINR,SASPAVVGSGSFR 
FBpp0079064 ninaCc neither inactivation 20 8 2 15.66 5.8 1.27 447 102.5 0.25 0.09AILMLVNAGTPVNNDSTR,MYPEDLAAL LPFDEFLR,TALDNLLTKPDGLF 
nor afterpotentialC, ENPVDENIIESLR,AMFQIIR,LCDFGLSR, YIIDDASR,TLYKEPELFVDR,SDI 
isoform B TLYKEPELFVDR,SSLDESIMLMFTNQLT AEMLELSR,QYTTEEAR,SCQD 


K,DAVASTLYSR,YQFLAFDFDEPVEMT QDLIMDR,LVDFIINR,AAIELNR 
K,LPFDEFLR,TALDNLLTKPDGLFYIIDD 
ASR, YAEVENTDIVSR, YY NDEFLAR,MG 
ESDNIYNQGYFR,SDIAEMLELSR,AFTDI 
NR,QYTTEEAR,ADLEYKPR,SCQDQDLI 
MDR,LVDFIINR,AAIELNR, 

FBpp0072672 alpha-Spec alpha spectrin, 46 7 221.61 3.73 0.95 13175 81.5 0.36 0.04VSTLGAEAQR,LLDSYDLQR,QNQINSQ QEAFLANEDLGDSLDSVEALIK, 

isoformA YDNLLALAR, DLIGVQNLIK,FATDDSYLD LLNVISSGENMLK,ILETVEDIQE 

PTNLNGK,QEAFLANEDLGDSLDSVEAL R,ALNQAWAELK, YAALAAPMG 
IK,_LMDVSNLGVPEIEQR,VTEVNQLADK, ER,IQTQMQDLNEK,LNEACQQ 
DLTGVQNLK,CNSIEEIR,DQPFASDDIR, QQFNR 
DLASVQALQR,QQETPVVDITGK,LLAM 
QEQFR,LNEACQQQQFNR,FIESGHFDA 
DNIR,DADETVAWIAEK,QGFVPAAYIK, 
MQEIVVLWETLVQASDK, |QSVLAMGGN 
LIDK,RAALQEK,LLNVISSGENMLK,FDD 
FNDDLK,QAEIANYWQSLTTK, YAALAAP. 
MGER,DVVLSSDDYGR,DVAGAEALLER 
,>MQEIVVLWETLVQASDKK,LLVGSDDY 
GR,LGDEQTLQQFSR, ILYEQCMDLQLF 
YR,LQAASEESYRDPTNLQAK,DLEDEA 
AWIR,QLLEDSNR,EKEPIAASTNR,QLD 
ETANR.ALDIFATK,ETENVQSYEEIENAF 
R,AlISADELAK ALAALDQK,|ILETVEDIQ 
ER,ALNQAWAELK,NKEGNLSAR,|QTQ 
MQDLNEK,LIDGQHYAADDVAQR,DADE 


IENWIAEK 
FBp| 2 1 0 0.14 AQSDSTAVAASR 
LR, EDVGRDEA 
MLFDANR,MLDTMTPGK 
DNR, AQSDSTAVAA 
ELAEEAER,LKQET 
R,EDNFGAC! 
16 5 0 0 712 44 0.14 ;DIQTA 
FBpp0072127Ca-P60A calcium ATPase at 13 2 015.49 2.55 0 411.33 345 0.21 0.03VIVITGDNK,EVFDSIVR,TGTLTTNQMSV TGTLTTNQMSVSR,NILFSGTNV 
60A, isoform H SR,YGPNELPTEEGK,EFTLEFSR,LNSF AAGK 
SVNK,FSIPVVLLDETLK,SAAEMVLADD 
NFSSIVSAVEEGR, TVEQSLNFFGTDPE 
R,EFDDLSPTEQK,VGEATETALIVLAEK, 
18 0 014.12 0 0 249 0.41 
MENQNAE! 
FBpp0312078 sesB stress-sensitive B, 12 7 23746 22.07 5.69 202.5 62 1.24 0O.50GTGGAFVLVLYDEIK,YFPTQALNFAFK, YFPTQALNFAFK,EFTGLGNCLT 
isoform B EFTGLGNCLTK,EQGFSSFWR,GMVDC K,EQGFSSFWR,TAVAPIER,GA 


FIR,GMLPDPK, TAVAPIER, QVFLGGVDK FSNILR,QEGTGAFFK,SDGIVGL. 
»ATEVIYK,GAFSNILR,QEGTGAFFK,SD YR 


FBpp008848 1 Syn synapsin, i 1F 9 ie) 0 ie) 0 0.32 
8 121.07 266 2.66 32 V 
LK,FDMNS 
VNR,DVDFSVLTK,VILADNSTIPK E 
SGILPAQIFDGFPR 
70 (2)01289 6 fo} ) te) 0 0.09 B LFE 
G LNELENIDDELEKE V 
R EGNLEDEEK,|PALYEGDLMNE 
DEVLE DEDDVIEDVTSK 
FBpp0081031Dap160 dynamin associated 6 2 0 62 1.64 te) 162 24 0.14 0.03DTSMSEMSQLK,AELSALITK,KEDINTN SGYLTGSQAR,LLQLTQER 
protein 160, isoform DVQMSELK,ALQPQAGFVTGAQAK,LLQ 
A LTQER, YTQVFNANDR 
FBpp0293349 CG43078 CG43078, isoformB 5 2 1316 1.17 O07 80.5 89 0.07 0.02VVMETIDDDEFFLR,QYISEAIR,GISEDNI QYISEAIR,AAELDDNEDVGPR 
QLR,QFAEFDEENR,AELDDNEDVGPR 
FBpp0085357 Ars2 Ars2, isoform D 4 2 0 551 2.23 te) 137 72 0.13 0.06VAIADPLVER,FVQANTQELAK.VTNNDV VDSSQADALIR,VAIADPLVER 
FBpp0290 4 1 0 463 099 0 92 23 
3 io} 0) o} 0 45 
FBpp0307 1 0 0.64 0 52 QLEKLK QLEKLK 


gmr-gal4 > UAS-Flag-Strep-Nocte-Strep (FSNS), gmr-gal4 > UAS-Flag-Strep-Nocte-HA (FSNH) and control gmr-ga/4 (driver only, NEG). Data show the numbers of unique peptides, the % protein sequence 
coverage, Mascot Scores (Matrix Science) and EMPAI (empirical abundance index) scores and the peptides sequences derived from Mascot search engine. Data were compared using Protein Centre 
(Thermo). Black entries are high confidence hits and grey entries are lower confidence based on prior knowledge of known contaminants*“. nocte mutants show defects in ChO morphology, pointing 
to a structural role of NOCTE in ChO cilia?. Consequently, the majority of the identified proteins (10/16) likely regulate function and dynamics of the ChO neuron cilia. As we were mainly interested in 
identifying potential temperature receptors, we focused on other NOCTE-interacting proteins, particularly on lonotropic Receptor 25a (IR25a). 
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Extended Data Table 2 | Rhythm analysis of control and /R25a mutant flies under free running (DD) conditions at different ambient 


temperatures 

Genotype °c n %Rhythmic Period(hr) +SEM 
Canton S 18 32 84 23.7+ 0.2 
IR25a~ 18 32 75 24.1+0.3 
rescue 18 28 60 23.6+0.1 
per 18 22 82 26.8+0.1 
Canton-S 25 45 96 24.2+0.1 
IR25a” 25 44 75 24.0+0.1 
rescue 25 30 97 23.6+0.1 
pert 25 26 38 28.4+0.2 
Canton S 29 60 100 23.4+0.1 
IR25a” 29 39 77 23.4+0.1 
rescue 29 27 100 23.8+0.1 
per 29 54 52 29.6+0.4 


Period values were calculated using autocorrelation as described in*°. Flies with a rhythm statistics (RS) value >1.5 were considered rhythmic*°. 
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Fungal pathogen uses sex pheromone receptor for 
chemotropic sensing of host plant signals 


David Turra!, Mennat El Ghalid!, Federico Rossi! & Antonio Di Pietro! 


For more than a century, fungal pathogens and symbionts have 
been known to orient hyphal growth towards chemical stimuli 
from the host plant’”. However, the nature of the plant signals 
as well as the mechanisms underlying the chemotropic response 
have remained elusive*. Here we show that directed growth of the 
soil-inhabiting plant pathogen Fusarium oxysporum towards 
the roots of the host tomato (Solanum lycopersicum) is triggered 
by the catalytic activity of secreted class III peroxidases, a family 
of haem-containing enzymes present in all land plants*. The 
chemotropic response requires conserved elements of the fungal cell 
integrity mitogen-activated protein kinase (MAPK) cascade® and the 
seven-pass transmembrane protein Ste2, a functional homologue of 
the Saccharomyces cerevisiae sex pheromone a receptor®. We further 
show that directed hyphal growth of F. oxysporum towards nutrient 
sources such as sugars and amino acids is governed by a functionally 
distinct MAPK cascade. These results reveal a potentially conserved 
chemotropic mechanism in root-colonizing fungi, and suggest a new 
function for the fungal pheromone-sensing machinery in locating 
plant hosts in a complex environment such as the soil. 
Root-colonizing fungi have a dramatic impact on plant health’. 
Beneficial symbionts such as mycorrhiza promote plant growth by 
supplying nutrients and microelements, while soil-borne patho- 
gens provoke devastating yield losses and are highly persistent and 
difficult to control. F oxysporum causes vascular wilt disease in over 
100 field and greenhouse crops®. Infectious hyphae penetrate the roots 
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preferentially through natural openings at the junctions of epidermal 
cells®, indicating that the fungus can sense and grow towards chem- 
ical signals from the host plant. To learn more about the underlying 
mechanism, we developed a quantitative chemotropism assay on 
agar plates (Extended Data Fig. la—c). Microconidia of F. oxysporum 
exposed to a gradient of glutamate (Glu) produced significantly more 
germ tubes pointing towards the nutrient source than towards the 
solvent control, resulting in positive chemotropism (Extended Data 
Fig. 1d and Fig. 1a). Two hours exposure time was sufficient to induce 
a chemotropic response, indicating rapid reorientation of the hyphal 
growth axis towards the new gradient. Different nitrogen and carbon 
sources such as Glu, Asp or glucose elicited a chemotropic response, 
while others such as Gln, Met, ammonium or galactose did not 
(Fig. 1b). Importantly, germ tubes growing towards the chemoattract- 
ant did not differ in length from those growing towards the solvent, 
ruling out a bias from growth speed (Extended Data Fig. le). Thus, 
E oxysporum responds rapidly and specifically to nutrients by redirect- 
ing hyphal growth towards the chemoattractant gradient. 

In the model fungi S. cerevisiae and Neurospora crassa, chemotropism 
towards a mating partner is mediated by opposite gradients of diffus- 
ible peptide sex pheromones®'™"!, Although no sexual cycle has yet 
been reported in F oxysporum, its genome encodes a putative protein 
with the characteristic hallmarks of fungal a-pheromone precursors, 
containing ten a-pheromone decapeptide repeats with near-identical 
sequence (Extended Data Fig. 1f, g). Synthetic a-pheromone from either 
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Figure 1 | E oxysporum exhibits chemotropic growth towards different 
compounds. a, Germination of microconidia over time. Germ tube 
emergence sites are visualized by Triticum vulgaris lectin-fluorescein 
isothiocyanate (FITC) staining. DIC, differential interference contrast. 
Scale bar, 5 um. b, Directed growth of germ tubes after 13h exposure to a 
gradient of the indicated componds. Gluc, glucose; Gal, galactose; Glyc, 
glycerol; Cel, cellulose; Pec, pectin (versus solvent control, *P< 0.0001). 


F. o. a-pher 


c, Directed growth towards a gradient of synthetic a-pheromone (a-pher) 
of S. cerevisiae (S. c.) or of E oxysporum (E o.), either untreated (C), 

boiled (100°C) or treated with trypsin (Trp), proteinase K (PK), boiled 

PK (PK 100°C) or PK plus its inhibitor phenylmethanesulfonylfluoride 
(PK+ PMSF); or of a-pheromone analogues (D-Alaj,2) or (D-Alag,7) (versus 
untreated, *P<0.0001). b, c, Data are presented as the mean from two 
experiments. n= 500 germ tubes. Error bars show standard deviation (s.d.). 


1Departamento de Genética, Campus de Excelencia Internacional Agroalimentario ceiA3, Universidad de Cordoba, 14071 Cérdoba, Spain. 
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Figure 2 | Secreted tomato root peroxidases elicit fungal chemotropism. 
a, Directed growth of F. oxysporum towards tomato roots (TR) or root 
exudate (RE) (versus H20, * P< 0.0001). b, c, Secretion of peroxidase 
activity (b) and its spatial distribution (c) in tomato roots was visualized 
by staining with 2,2'-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid) 
(ABTS) plus H2O>. Scale bars, 1 cm (b) and 1mm (c). d, e, Detail of the 
root section marked in c, showing colonization by F. oxysporum expressing 
green fluorescent protein (GFP). Scale bar, 250 um. Experiments were 
performed four times with similar results. f, Peroxidase enzymatic activity 
is required for chemoattraction. Directed growth of germ tubes towards 
root exudate or a gradient of 4 uM HRP, either untreated (C), boiled 


E oxysporum or S. cerevisiae elicited a robust chemotropic response that 
was largely abolished by protease treatment or alanine substitution of two 
conserved residues (Gly¢ and Glnz), indicating its specificity (Fig. 1c). 

We next asked whether F. oxysporum exhibits directed growth 
towards the host plant. Both tomato roots and root exudate induced 
a significant chemotropic response (Fig. 2a). Chemoattractant activ- 
ity of root exudate was sensitive to proteinase K treatment, parti- 
tioned into the water phase after ethyl acetate extraction and residing 
predominantly in the molecular weight fraction between 30 and 
50 kilodaltons (kDa), suggesting that it originates from one or several 
proteins (Extended Data Fig. 2a). Separation by anion exchange chro- 
matography (Extended Data Fig. 2b, c) and SDS—polyacrylamide gel 
electrophoresis (SDS-PAGE) identified two protein bands in the chem- 
otropically active fractions that were absent from the inactive ones and 
elicited a significant chemotropic response (Extended Data Fig. 2d, e). 
Analysis by in-gel tryptic digestion followed by liquid chromatography- 
electrospray ionization-tandem mass spectrometry (LC-ESI-MC/MC) 
identified three tomato proteins, TMP1, TMP2 and CEVI-1 (Extended 
Data Fig. 3a). They belong to class III peroxidases, secreted haem- 
containing oxidoreductases present in all land plants, which catalyse 
the reductive cleavage of hydrogen peroxide by an electron donor”, 
are encoded by multigene families, and function in diverse physiolog- 
ical processes such as cell wall modification and pathogen defence’. 
Arabidopsis thaliana has over 70 members of this family, many of which 
are expressed and secreted in roots". 

We observed a strong gradient of peroxidase activity exuded by 
tomato roots into the adjacent medium (Fig. 2b). The highest enzy- 
matic activity was associated with the root hair zone, which also exhib- 
its the highest density of colonization by F oxysporum (Fig. 2c-e). 
Secreted peroxidase activity differed considerably between root exudates 
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(100°C), or in the presence of the peroxidase inhibitor salicylhydroxamic 
acid (SHAM) or the oxygen radical scavenger (+)-sodium L-ascorbate 
(Asc) (versus untreated, *P< 0.0001). g, h, Chemoattractant activity of 
heterologously expressed tomato peroxidase requires an intact catalytic site. 
g, Enzymatic activity of 56 nM recombinant tomato peroxidases CEVI-1, 
TMP2 and TMP2(R38S,H42E), indicated as units ml~!. Data are presented 
as the mean from two experiments, each with two technical replicates 
(versus TMP2, *P<0.005). h, Directed growth towards a gradient of 

169 nM recombinant CEVI-1, TMP2 or TMP2(R38S,H42E) (versus TMP2, 
*P<0.0001). a, f, h, Data are presented as the mean from two experiments. 
n=500 germ tubes. Error bars show s.d. 


collected from different tomato plants, reflecting the multiplicity and 
complex regulation of plant class III peroxidase genes*. Importantly, 
enzymatic activity in individual root exudates correlated significantly 
with fungal chemoattraction (Extended Data Fig. 2f, g). We next tested 
horseradish peroxidase (HRP), which shares 39%, 37% and 49% amino 
acid identity with tomato TMP1, TMP2 and CEVI-1, respectively 
(Extended Data Fig. 3b), and whose molecular structure and cata- 
lytic properties are well characterized'”. Commercial HRP triggered a 
robust chemotropic response in FE. oxysporum (Fig. 2f). Chemotropism 
induced by HRP and tomato root exudate was abolished when peroxi- 
dase enzymatic activity was eliminated by boiling or by addition of the 
specific inhibitor salicylhydroxamic acid (SHAM), or in the presence 
of the oxygen radical scavenger ascorbate (Extended Data Fig. 2h and 
Fig. 2f). However, these inhibitors did not prevent chemotropism 
towards glucose or a-pheromone (Extended Data Fig. 2i). 

To confirm the importance of peroxidase catalytic activity in the 
process of chemoattraction, we heterologously expressed tomato perox- 
idases CEVI-1 and TMP2, as well as a point-mutated version of TMP2 
in which the conserved Arg 38 and His 42 residues!” were substituted 
by Ser and Glu, respectively (Extended Data Fig. 3b). Recombinant 
catalytically active CEVI-1 and TMP2, but not catalytically inactive 
TMP2(R38S,H42E) triggered a robust chemotropic response in F. 
oxysporum (Fig. 2g, h). Collectively, these findings demonstrate that 
secreted root peroxidases elicit a chemotropic response in E oxysporum 
through a mechanism that requires enzymatic activity. 

We next asked whether directed hyphal growth towards different che- 
moattractants is mediated by common or distinct cellular mechanisms. 
Both a-pheromone and Glu exhibited a bell-shaped dose-response 
curve with a gradual decrease at higher concentrations (Fig. 3a), 
which could be explained by receptor saturation, as previously shown 
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Figure 3 | Chemotropism towards nutrients and a-pheromone is 
governed by distinct MAPK cascades. a, Dose-response curves for 
chemotropism towards a-pheromone (a-pher) or Glu. b, Elements of 
the E oxysporum Fmk1 and Mpkl MAPK cascades. ¢, d, Directed growth 


for the pheromone response in S. cerevisiae!'. Remarkably, chemo- 
tropic sensitivity of F oxysporum to a-pheromone was three orders of 
magnitude higher compared to Glu, suggesting that the two responses 
may be governed by distinct cellular mechanisms. To test this idea, 
we used fungal mutants lacking defined elements of MAPK cascades, 
three-component signalling modules conserved from yeast to humans 
that function in succession to transmit a variety of cellular signals". 
Like most fungi, E oxysporum has three MAPKs orthologous to S. 
cerevisiae Kss1, Mpk1 and Hog] (ref. 15). The p42/44 MAPK Fmkl1, 
its upstream MAPKK Ste7 and MAPKKK Ste11, as well as the down- 
stream transcription factor Ste12 (Fig. 3b), function in a conserved 
pathway that governs filamentation in S. cerevisiae and invasive growth 
in plant pathogens'*"!”. Isogenic E oxysporum mutants lacking fmk1, 
ste7, ste11 or ste12 were impaired in chemotropism towards Glu or 
glucose, but not a-pheromone (Fig. 3c and Extended Data Fig. 4). 
S. cerevisiae mutants lacking the orthologous MAPKs Fus3 or Kss1 also 
maintain most of the chemotropic response towards a-pheromone®. To 
investigate whether another MAPK mediates chemotropic sensing of 
a-pheromone, we used the chemical inhibitors PD98059 and SB202190, 
which selectively block p42/44 MAPKs and p38 MAPKs, respec- 
tively. PD98059, but not $B202190, prevented chemotropism towards 
a-pheromone (Extended Data Fig. 5a), pointing to a role of the second 
p42/44 MAPK, Mpkl. In S. cerevisiae, Mpk1 together with Mkk1/2 and 
Bck1 functions in the cell wall integrity (CWI) MAPK module’, which 
is activated during the pheromone response!*. Mutations in CWI path- 
way components have pleiotropic effects on cell wall architecture and 
impair fungal virulence on plants*!>?. Loss of the orthologous genes 
mpk1, mkk2 or bck1 in FE. oxysporum (Fig. 3b) led to high sensitivity 
against the cell-wall-perturbing compounds Congo red and Calcofluor 
white, confirming that they function in the CWI response (Extended 
Data Fig. 5b-g). Interestingly, the mutants were impaired in chem- 
otropism towards a-pheromone but remained responsive to Glu or 
glucose (Fig. 3c and Extended Data Fig. 5h, i). Moreover, a mutant in 
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of indicated fungal strains towards a gradient of Glu, glucose (Gluc) or 
a-pheromone (c), or tomato roots, root exudate or HRP (d) (versus wild 
type (WT), *P< 0.0001). a, c, d, Data are presented as the mean from two 
experiments. n= 500 germ tubes. Error bars show s.d. 


the small G protein Rhol (ref. 19), located upstream of the CWI MAPK 
module’, also failed to grow towards a-pheromone. In S. cerevisiae, 
Rhol mediates localization of key components of the CWI pathway to 
the tips of pheromone-induced mating projections”. 

We sought to confirm these results independently using a differ- 
ent chemotropism assay based on the angle of hyphal tip projections 
relative to the chemoattractant gradient. This method was used 
extensively in S. cerevisiae to study chemotropic responses to mating 
pheromones®!®!!, The average cosine of hyphal tip projection angles 
was significantly higher when EF oxysporum was exposed to a gradient 
of Glu or a-pheromone compared with the water control, confirming 
positive chemotropism (Extended Data Fig. 6a, b). The fmk1A mutant 
was specifically impaired in growth towards Glu while the mpk1A 
mutant failed to respond to a-pheromone. Taken together, these results 
establish that the invasive growth and CWI MAPK pathways have dis- 
tinct and complementary roles in chemotropic sensing of nutrients and 
sex pheromones. Consistent with this model, a fmk1A mpk1A double 
mutant responded neither to nutrients nor to a-pheromone (Fig. 3c). 

We next tested whether these MAPK cascades are required for 
chemotropic growth of E oxysporum towards the host plant. Mutants 
lacking Fmk1, Ste7, Stel1 or Ste12 were not affected in chemotropism 
towards tomato roots or root exudate, but those lacking Mpk1, Mkk2, 
Bck1 or Rhol were impaired (Fig. 3d and Extended Data Figs 4g, 5i). 
Importantly, CWI components were also essential for the chemotropic 
response to HRP while Fmk1 and Ste12 were not (Fig. 3d and Extended 
Data Fig. 5i). Thus, peroxidase-mediated chemotropism of F. oxysporum 
towards plant roots is specifically governed by the CWI MAPK cascade. 

In S. cerevisiae and N. crassa, chemotropic sensing of a-pheromone 
requires the seven-pass transmembrane (7TM) G-protein-coupled 
receptor (GPCR) Ste2 or Pre-2, respectively®!°. E oxysporum has a 
putative Ste2 orthologue with seven predicted transmembrane regions 
and a topology characteristic of ascomycete a-pheromone receptors 
(Extended Data Fig. 7a, b). Loss of Ste2 (Extended Data Fig. 7c, d) 
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abolished chemotropism of F oxysporum towards a-pheromone but not 
towards nutrients, confirming that it is a functional homologue of yeast 
Ste2 (Fig. 4a). Strikingly, ste2A mutants were impaired in chemotropism 
towards tomato roots, root exudate and HRP (Fig. 4b). This was unex- 
pected because Ste2 is generally regarded as a specific receptor for 
a-pheromone”". To corroborate further the role of Ste2 in chemotropic 
sensing of root chemoattractants, fungal strains were exposed to oppo- 
site gradients of Glu versus either a-pheromone, root exudate or HRP. 
The wild-type and the complemented strain failed to display directed 
growth, suggesting that chemotropism is annulled in the presence of 
two competing gradients (Fig. 4c—e), while ste2A grew towards Glu, con- 
firming its incapacity to sense a competing gradient of a-pheromone, 
root exudate or HRP. Loss of Ste2 caused a small but significant 
decrease in virulence of F. oxysporum on tomato plants, indicating 
that Ste2-mediated chemotropic sensing of secreted root compounds 
is important for initiation of fungal infection (Extended Data Fig. 7e). 
Interestingly, in a previous study Petunia plants defective in secretion 
of the signalling compound strigolactone were significantly delayed in 
the symbiotic interaction with arbuscular mycorrhizal fungi”. 

Our results reveal a previously unknown ability of the fungal path- 
ogen E oxysporum to reorient hyphal growth towards a variety of 
chemical signals. On the basis of genetic evidence, we propose that 
chemotropism is mediated by distinct MAPK modules: Fmk1 for nutri- 
ents and Mpk1 for sex pheromones and plant compounds (Extended 
Data Fig. 8). Remarkably, F oxysporum uses the same signalling pathway 
for chemotropic sensing of mating factors and host cues, including Ste2, 
a7TM GPCR that was previously thought to function specifically in a- 
pheromone sensing. How secreted plant peroxidases generate a chemoat- 
tractant signal, and how Ste2 mediates signal sensing and chemotropic 
response in concert with the CWI pathway, remain to be determined. 
Since class III peroxidases and fungal MAPK cascades are evolutionarily 
conserved*">, our findings might be of general relevance to the chemo- 
tropic interaction between plants and root-colonizing fungi. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Fungal strain culture and transformation. Fungal strains used in this study 
are listed in Extended Data Table 1. All are derivatives of F. oxysporum f. sp. 
lycopersici isolate 4287 (FGSC 9935). Strain culture and storage were per- 
formed as described!°. Phenotypic analysis of colony growth and invasion of 
cellophane membranes was done as reported”’, Targeted gene replacement with 
the hygromycin resistance cassette and complementation of the mutants by co- 
transformation with the phleomycin resistance cassette were performed as 
reported’. Oligonucleotides used to generate PCR fragments for gene replace- 
ment, mutant identification and complementation are listed in Extended Data 
Table 2. F oxysporum gene data are available in the Fusarium Comparative 
Database at the Broad Institute under the following accession numbers: ste11, 
FOXG_09411; ste7, FOXG_05521; fmk1, FOXG_08140; ste12, FOXG_02103; ste2, 
FOXG_10633; rhol, FOXG_13835; bck1, FOXG_08078; mkk2, FOXG_02117; 
mpk1, FOXG_05092. 

Quantification of fungal chemotropism. Freshly obtained microconidia were 
embedded in 4 ml water agar (WA; 0.5%, w/v) (Oxoid) at a final concentration of 
2.5.x 10° per ml and poured into a standard Petri dish (Extended Data Fig. 2a). A 
central scoring line was drawn on the bottom of the plate, and two parallel wells 
were cut into the WA layer on both sides at 5 mm distance from the scoring line. 
Then, 50 ul of the test compound solution or the solvent control were added to 
the wells at both sides of the scoring line. In gradient competition experiments, 
solutions of the two different test compounds were applied at both sides of the 
scoring line. Tested compounds and standard concentrations were: sodium glu- 
tamate (Glu), glutamine (Gln), sodium aspartate (Asp), methionine (Met), all 
at 295 mM; ammonium nitrate (NH4,), glucose (Gluc), galactose (Gal), glycerol 
(Glyc), all at 50 mM; or cellulose (Cel), pectin (Pec), all at 1% (w/v). Sterile water 
or methanol were used as solvent controls. To measure chemotropism towards 
tomato plants, the root of a 2-week-old tomato seedling was placed directly on 
top of one of the wells. A sterile metal string was placed on the opposite well 
as a control. Plates were maintained in a plastic box at 28°C in the dark for the 
indicated time periods (13h unless otherwise stated). Chemotropism of coni- 
dial germ tubes was quantified with an Olympus binocular microscope (200 x 
magnification), by counting the number of hyphal tips pointing towards the test 
compound and those pointing towards the solvent control. The chemotropic 
index was calculated as ((Htest — Hsolv)/ total X 100), where Hest is the number of 
hyphae growing towards the test compound, Holy is the number of hyphae grow- 
ing towards the solvent control, and Hota is the total number of hyphae counted. 
For each test compound a total of 500 hyphal tips were scored. All experi- 
ments were performed at least twice. Statistical analysis was conducted using 
t-test. 

For the hyphal tip projection assay, light microscopy photographs of chem- 
otropism plate assays were recorded in a Leica DMR microscope (200 x magnifi- 
cation) using a Leica DFC 300 FX digital camera, and the angle (in degrees) of the 
hyphal tip relative to the chemoattractant gradient was measured using the Image] 
software‘, For each test compound a total of 300 hyphal tip projection angles 
were measured. All experiments were performed at least twice. Length of germ 
tubes growing towards the test compound or the solvent control was measured 
using ImageJ. For visual monitoring of compound diffusion, 50 ul of a 1% (w/v) 
solution of Congo red in water was added to the test compound well and 50 ul of 
water into the solvent control well. Plates were incubated at 28 °C, and dye diffusion 
was documented after different time periods in a Leica MZ FLIII fluorescence 
stereomicroscope using a Leica DFC 300 FX digital camera. Dye intensity was 
quantified with the KODAK 1D Image Analysis software. 

The MAPK inhibitors PD98059 and $B202190 (Calbiochem) were added to the 
WA medium at a final concentration of 10 or 30 uM, respectively, before adding the 
chemoattractant compound. Commercial horseradish peroxidase (HRP; Sigma) 
was assayed at a standard concentration of 4 1M. Peroxidase inhibitors/scavengers 
salicylhydroxamic acid (SH), thiourea (TU) and (+)-sodium L-ascorbate 
(Asc) (Sigma) were added directly to the chemoattractant solution at a final con- 
centration of 60mM, 60 mM and 160 mM, respectively. Synthetic E oxysporum 
a-pheromone, its analogues (p-Ala;,2) and (p-Alag,7), and S. cerevisiae a-factor 
were obtained from GenScript (Piscataway). Lyophilized peptides were dis- 
solved in 50% (v/v) methanol in water and assayed at a standard concentration of 
378 uM. To test the effect of different treatments on the chemoattractant activity of 
FE. oxysporum a-pheromone, the peptide was incubated for 10 min at 100°C, or 
for 30 min at 37°C with 1 mg ml trypsin, 1 mg ml“! proteinase K, 1 mg ml“! 
heat-denatured (10 min at 100°C) proteinase K, or 1 mg ml! proteinase K plus 
1mM phenylmethylsulfonyl fluoride (PMSF) (all from Sigma). 

Light and fluorescence microscopy. Low-resolution imaging was performed 
using a Lumar.V12 fluorescence stereomicroscope (Zeiss). Wide-field fluo- 
rescence imaging was performed using a Zeiss Axio Imager M2 microscope 
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equipped with a Photometrics Evolve EMCCD camera. To visualize growth of 
F. oxysporum on tomato roots, freshly obtained microconidia of the wild-type- 
GFP strain!® were embedded in WA as described earlier. One-day-old germi- 
nated tomato seeds were placed on top of the medium and incubated for 2 days 
at 28°C before observation in the microscope. 

To visualize sites of germ tube emergence, F. oxysporum microconidia were 

stained with 50 ug ml! fluorescein isothiocyanate-labelled lectin from Triticum 
vulgaris (WGA-FITC) (Sigma) in PBS containing 12.5 4M CaCl and 12.5 uM 
MnCl. 
Identification of the root chemoattractant. Tomato seeds (cultivar Monica) 
were provided as a gift by Syngenta Seeds. Seeds were surface sterilized”, planted 
in moist vermiculite and maintained in a growth chamber (15/9 h light/dark 
photoperiod, 28°C) until the plants reached the second true leaf stage. Roots 
were washed carefully to remove the adhering substrate, placed in sterile water 
and kept at 25°C for 48h. The collected root exudate was filtered through a 
0.22-um Millipore membrane and stored at — 20°C until use. To measure fresh 
root weight, roots were cut from individual plants, gently blotted with a paper 
towel and weighed. 

Filter-sterilized root exudates were partitioned with ethyl acetate to obtain an 
ethyl acetate fraction and a water fraction (WF). The WF was further separated 
by centrifugal ultrafiltration (MWCO 10 kDa; 30 kDa; 50 kDa) (Corning). The 
30-50 kDa fraction was applied to a Hitrap QFF anion exchange chromatography 
column on an AKTA purifier (GE Healthcare), and proteins were eluted with a 
linear gradient of NaCl. Fractions were desalted by dialysis, tested for chemo- 
tropic activity as described earlier and analysed by SDS—polyacrylamide (10%) gel 
electrophoresis, followed by Coomassie blue staining. Proteins bands were eluted 
from the gel and tested for chemoattractant activity. Proteins of interest were sub- 
jected to tryptic digestion and analysed by liquid chromatography-electrospray 
ionization-mass spectrometry (LC-ESI-MS). Identification of tomato proteins 
was carried out at the Protein Micro-Analysis Core Facility of the Biozentrum, 
Innsbruck Medical University. 

Heterologous expression of tomato peroxidases. Recombinant tomato peroxidase 
proteins were produced in Escherichia coli strain BL21 (DE3) ung-151 transformed 
with plasmids pET28a-Cevi-1, pET28a-Tmp2 and pET28a-Tmp2(R38A,H42A), 
respectively. Plasmids were obtained by subcloning the corresponding com- 
plementary DNA fragments lacking the sequence encoding the signal peptide 
in the vector pET28a(+), using XhoI and Ndel. Solubilization and re-folding 
of recombinant peroxidases was performed as described”. Purification of the 
recombinant proteins was performed on an AKTA purifier using a Ni-NTA 
chromatography column. 

Peroxidase enzymatic activity assays. To visualize peroxidase activity secreted 
by roots, 4-day-old tomato seedlings were placed on 0.5% WA supplemented with 
0.91 mM 2,2/-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid (ABTS) (Sigma) 
and 2.5mM H,0, (J. T. Baker) (WA-ABTS), and incubated for 45 min at 28°C. 
Seedlings were carefully removed from WA-ABTS, placed in a Petri dish containing 
0.5% agarose (w/v) and imaged with a stereo microscope. 

Peroxidase activity assays were carried out in 96-well microtitre plates. The reac- 
tion mixture contained 0.91 mM ABTS, 2.5mM HQ) in phosphate-citrate buffer 
(51mM Naj,HPO,. 24mM citric acid, pH 5.6) in a final volume of 150 ul. Where 
applicable, peroxidase inhibitors/scavengers (75 uM TU and SH or 250M Asc) 
were pre-incubated with the buffer for 5 min before adding ABTS and H2O3. For 
each reaction, a blank containing heat-inactivated (20 min boiling) peroxidase was 
included. Reactions were incubated at 28°C and absorbance at 405 nm was meas- 
ured at different time intervals in a Spectrafluor Plus microplate reader (Tecan). 
Peroxidase activity was calculated in units per ml, using the formula ((AA405 nm/ 
min test — A.A4o5 m/min blank) x (total volume assay) x (dilution factor))/ 
((millimolar extinction coefficient of oxidized ABTS at 405 nm) x (volume enzyme 
used)). Statistical analysis was conducted using t-tests. 

Tomato seedling infection assay. Surface-sterilized tomato seeds (cultivar 
Monica) were transferred to sterile glass tubes containing WA or WA supplemented 
with 2.5 x 10° microconidia per ml of the different E oxysporum strains. Plants 
were maintained in a growth chamber (15/9h light/dark cycle, 28°C). Survival 
was recorded daily, calculated by the Kaplan-Meier method and compared among 
groups using the log-rank test. Virulence experiments were conducted with 40 
plants per treatment and performed twice with similar results. 

Bioinformatic and statistical analysis. F oxysporum predicted proteins Ste11, 
Ste7, Bck1, Mkk2, Mpk1 and Ste2 were identified by BLASTp search of the 
Fusarium Comparative Database at the Broad Institute (http://www.broad 
institute.org/annotation/genome/fusarium_group/MultiHome.html), using the 
amino acid sequences of the S. cerevisiae proteins. Protein alignments and phy- 
logenetic comparisons were done using ClustalW (ref. 26) and MEGAS (ref. 27). 
Protein domain predictions were made using the Prosite database (ExPASy; Swiss 
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Institute of Bioinformatics). Prediction of Ste2 transmembrane helices was done 
with SOSUI’®. Linear regression analysis was conducted using MedCalc v. 12.1.0 
(MedCalc Software). 

Data reporting. No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded to allocation 
during experiments and outcome assessment. 
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Extended Data Figure 1 | Plate assay for quantitative determination of 
directed hyphal growth and identification of a E oxysporum orthologue 
of the S. cerevisiae a-pheromone precursor. a, Schematic representation 
of the plate chemotropism assay. Test compound and solvent control are 
applied to opposite sides of a Petri dish containing a layer of water agar 
with 2.5 x 10°ml~! E oxysporum microconidia, at a distance of 0.5cm 

from the central scoring line. Chemotropic index was calculated as 

((Heest — Hsotv)/ Hiotal X 100), where Hest is the number of hyphae growing 
towards the test compound, Holy is the number of hyphae growing towards 
the solvent control, and Hiotai is the total number of hyphae counted. 

b, Visualization of compound diffusion and gradient establishment. The 
dye Congo red (1% w/v in water) was loaded into the application well on 
the right side of the scoring line. Diffusion was recorded photographically 
after the indicated time intervals. c, Dye intensity in experiment b was 
measured at the indicated distances from the application well after different 
time intervals, using the Kodak Image Analyzer software. The blue dashed 
line represents the relative position of the scoring line. Mean values were 
calculated from measurements of five individual spots per distance. 

d, Direction of germ tube emergence after 2h exposure to a gradient of Glu 
or the solvent (H,O) was quantitatively determined by lectin-FITC staining 
and expressed as chemotropic index (versus H2O, *P< 0.0001). Data are 


presented as the mean from two experiments. n= 200 germ tube emergence 
sites. e, Lengths of germ tubes exposed for 13h to a gradient of 1% (w/v) 
cellulose (Cel), 55 mM glucose (Gluc), 295 mM Glu or the solvent (H20) 
were measured using the Image] software. The mean length of germ tubes 
growing towards the nutrient chemoattractants is not significantly different 
from that of germ tubes growing towards the solvent. Data are presented as 
the mean from two experiments. n= 100 germ tubes. Error bars show s.d. 

f. The predicted product of the F oxysporum a-pheromone precursor gene 
(Fusarium Comparative Database accession FOXG_08636) was aligned with 
predicted a-pheromone precursors from F. graminearum (FGSG_05061) and 
E verticillioides (FVEG_06038). Conserved residues are indicated with an 
asterisk. Predicted KR and RR cleavage signals for KEX2-like endopeptidases 
are highlighted in red. Predicted maturation signals characterized by the 
presence of XA or XP dipeptide repeats are highlighted in yellow. Predicted 
mature a-pheromone decapeptide repeats are highlighted in grey. Coloured 
arrowheads indicate differences between the decapeptide repeats at the 
third amino acid residue. g, Amino acid alignment of predicted mature 
a-pheromone of E oxysporum with orthologues from ascomycete fungi. 
Absolutely and highly conserved residues are shaded in black and in grey, 
respectively. Residues replaced with alanines in the (Ala;,2) or (Alag,7) 
analogues (see Fig. 1c) are indicated with asterisks. 
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Extended Data Figure 2 | Purification of chemoattractant compounds 
from tomato root exudate reveals secreted peroxidases. a, Chemotropic 
growth of germ tubes towards a gradient of tomato root exudate (RE) either 
untreated (no treat); treated with 1 mg ml! proteinase K for 30 min at 37°C 
(PK); extracted to obtain an ethyl acetate fraction (EAF) and a water fraction 
(WF); or the WF subjected to centrifugal ultrafiltration with membranes of 
10, 30 or 50 kDa molecular weight cut-off to obtain fractions < 10, 10-30, 
30-50 and >50, respectively (*P=0.006; ** P< 0.0001, versus untreated). 

b, Anion exchange chromatography profile of fraction 30-50 from a. Obtained 
fractions F1-F5 are indicated. c, Directed growth of F. oxysporum germ tubes 
towards fractions F1-F5 from b (* P< 0.0001, versus HO). d, SDS-PAGE of 
biologically active fraction F1 and inactive fraction F5, followed by staining 
with Coomassie blue. Protein bands present in the active and absent from 
the inactive fraction (named B1-B5) are indicated by arrowheads. Relative 
positions of molecular weight markers are indicated on the right. e, Directed 
growth of germ tubes towards the proteins eluted from bands B1-B5. 

f, Peroxidase activity of root exudates obtained from 18 individual tomato 
plants, indicated as units ml“! per mg fresh root weight. Data are presented 
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as the mean of three technical replicates. Error bars show s.d. g, Relationship 
between peroxidase enzymatic activity of root exudates and elicited 
chemotropic response. Each empty circle represents a root exudate sample 
from an individual tomato plant (n= 18). Linear regression (solid line) and 
95% mean prediction interval (dashed lines) indicate linear correlation of the 
two variables (P< 0.001). h, Specific inhibitors and oxygen radical scavengers 
abolish peroxidase enzymatic activity. Activity of 2.5nM commercial HRP 

or 100 ul root exudate was measured in the absence (C) or presence of 75 

uM of the specific inhibitors thiourea (TU) or SHAM, or 250 uM of the 
scavenger (+)-sodium L-ascorbate (Asc), and indicated as units ml~!. Data are 
presented as the mean of three experiments, each with two technical replicates 
(*P<0.0002, versus C). i, Peroxidase inhibitors and scavengers do not affect 
chemotropism towards glucose and a-pheromone. Chemotropic growth of 
germ tubes towards a gradient of glucose or a-pheromone, in the absence (C) 
or presence of 60 mM SHAM or 160mM Asc. No significant differences were 
observed between treated and untreated samples. a, c, e, g, i, Data represent 
the mean from two experiments (a, ¢, g, i) or one representative experiment 
performed twice (e). 1=500 germ tubes. Error bars show s.d. 
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a 
Protein Protein Peptide Experimental Theoretical 
name band sequence mass (Da)' mass (Da) 
EMVALAGAHTVGFAR 1545,63/1529,55 1528,78 
TMP1 2/3 
AVVDSAIDAETR 1246,51/1246,43 1245,62 
YASSQSQFFDDFASSMIK 2074,83/2058,67 2057,90 
CEVI-1 2/3 LGNIGVLTGTNGEIR 1513,79 1512,83 
DAASNVGAGGFDIVDDIK 1763,79 1762,84 
EMVALAGAHTVGFAR 1545,60 1528,78 
TMP2 2 
LGGQTYTVALGR 1235,40 1234,67 
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Extended Data Figure 3 | Identification of chemoattractant proteins 
from tomato root exudate. a, Peptide sequences obtained from protein 
bands B2 and B3 after in-gel tryptic digestion followed by LC-ESI-MS/ 

MS. Masses were calculated by using monoisotopic masses of the occurring 
amino acid residues and giving peptide masses as [MH] +. b, Amino acid 
sequence alignment of class III tomato peroxidases TMP1 (P15003), TMP2 
(P15004) and CEVI-1 (Q9LWA2), and HRP isoenzyme C (HRP_C1A) 
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(K7ZWW6). Peptides identified in the chemotropically active fraction of 
tomato root exudate by LC-ESI-MS/MS are underlined in red. Predicted 
signal peptides are indicated by green boxes. Residues conserved in at least 
three of the four proteins are shaded in black. Conserved catalytic residues 
are indicated by orange boxes. Residues Arg 38 and His 42, which were 
replaced by Ser and Glu, respectively, in the catalytically inactive recombinant 
TMP2(R38S,H42E) protein (see Fig. 2g, h) are marked with blue asterisks. 
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Extended Data Figure 4 | Conserved elements of the invasive growth 
MAPK cascade are required for chemotropism towards glucose. 

a, b, Identification of ste7A (a) and ste11A (b) deletion mutants. Genomic 
DNA of the wild-type strain (WT) and several independent transformants 
was used as a template for polymerase chain reaction (PCR) with the primer 
pairs ste7PF + Hyg-G (P) and ste7TR + Hyg-Y (T) and stel1PF + Hyg-G (P) 
and stell1TR + Hyg-Y (T), respectively. Presence of an amplification 
product is consistent with homologous replacement of the target gene. 

c, d, Identification of complemented strains obtained from ste7A and ste11A 
mutants. Genomic DNA of independent transformants obtained upon 
transformation of the indicated mutants with the wild-type ste7 (c) or ste11 
gene (d) was used as a template for PCR with primer pairs ste7PFN + ste7GR 
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and stell1PFN + stel1GR, respectively. Presence of an amplification product 
is consistent with integration of an intact gene copy. e, Elements of the 

Fmk1 MAPK pathway are required for invasive growth through cellophane 
membranes. Colonies were grown on PDA plates covered with a cellophane 
membrane for 2 days at 28°C (before). The cellophane with the fungal colony 
was removed and plates were incubated for an additional day (after). The 
experiment was performed twice, each with three plates. Results shown are 
from one representative experiment. Scale bar, 2 cm. f, g, Directed growth 

of germ tubes of the indicated F oxysporum strains towards a gradient of 
glucose (Gluc) (f), a-pheromone or tomato root exudate (g) (versus wild type 
for a given compound, * P< 0.0001). f, g, Data are presented as the mean from 
two experiments. n= 500 germ tubes. Error bars show s.d. 
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Extended Data Figure 5 | Conserved elements of the CWI MAPK cascade 
are required for chemotropism towards a-pheromone, root exudate and 
peroxidase. a, Directed growth of germ tubes of the wild type or the fmk1A 
mutant towards a gradient of a-pheromone, in the absence or presence 

of PD98059 (selective p42/44 (ERK-type) MAPK inhibitor) or $B202190 
(selective p38/Hog1 MAPK inhibitor) (versus wild type, *P< 0.0001). 

b, Identification of mpk1A and fmk1A mpk1A deletion mutants by 

Southern blot analysis. Genomic DNA of the wild-type and 11 independent 
transformants was treated with EcoRI, separated on a 0.7% agarose gel, 
transferred to a nylon membrane and hybridized with a DNA probe 
corresponding to the 3’ flanking region of the mpk1 gene. Transformants #1, 
#4, #7 (wild-type background) and #1, #2, #4, #7 (fmk1A background) show a 
banding pattern consistent with targeted deletion of the mpk1 gene. 

c, d, Identification of mkk2A (c) and bck1A (d) deletion mutants. Genomic 
DNA of independent transformants was used as template for PCR with 

the primer pairs mkk2PF + Hyg-G (P) and mkk2TR + Hyg-Y (T), or 

bck1 PF + Hyg-G (P) and bck1TR + Hyg-Y (T), respectively. Presence of 
an amplification product is consistent with homologous replacement of 


the target gene. e, f, Identification of complemented strains obtained from 
mkk2A and bck1A mutants. Genomic DNA of independent transformants 
obtained after transformation of the indicated mutants with the wild-type 
mkk2 (c) or bck1 allele (d) was used as a template for PCR with the primer 
pairs mkk2PFN + mkk2GR, or bck1PFN + bck1GR, respectively. Presence 
of an amplification product is consistent with integration of an intact gene 
copy. g, Elements of the Mpk1 MAPK pathway are required for the cell 
wall stress response. Colony phenotypes of the indicated strains grown 

on yeast peptone dextrose medium (YPD) in the absence or presence of 
the cell-wall-perturbing compounds Calcofluor white (20 ug ml~!) or 
Congo red (100 ug ml“). Plates were spot-inoculated with the indicated 
amount of microconidia, incubated for 4 days at 28°C and scanned. The 
experiment was performed twice, each with three plates. Results shown are 
from one representative experiment. h, i, Directed growth of germ tubes of 
the indicated F. oxysporum strains towards a gradient of glucose (Gluc) (h), 
a-pheromone, tomato root exudate or HRP (i) (versus wild type for a given 
compound, *P < 0.0001). a, h, i, Data are presented as the mean from two 
experiments. n= 500 germ tubes. Error bars show s.d. 
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Extended Data Figure 6 | Hyphal tip projection angle assay reveals or the water control. Data are presented as the mean from three experiments. 
differential roles of Fmk1 and Mpk1 MAPKs in chemotropism n= 100 germ tubes. Bars indicate upper and lower 95% significance limits for 
towards glutamate and a-pheromone. a, Schematic representation of the cosine means according to a t-test. A cosine of 1 means perfect orientation 
chemotropism plate assay based on measurement of hyphal tip projection while 0 means random orientation. Chemotropism was considered 
angles. b, Average cosine of hyphal tip projection angles of the F. oxysporum significant when the lower confidence limit was >0. 


wild-type, fmk1A or mpk1A strains towards a gradient of Glu, a-pheromone 
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Extended Data Figure 7 | Loss of Ste2 negatively affects virulence 

of F. oxysporum on tomato plants. a, Phylogram of Ste2 orthologues 

from ascomycete fungi. The analysis was conducted using the MEGA5 
program. Distances were inferred using the unweighted pair group method 
with arithmetic mean (UPGMA). b. Two-dimensional model of the 
transmembrane topology of F. oxysporum Ste2. The model was generated 
using the SOSUI software”*. Amino acid residues in the primary and 
secondary transmembrane helix are indicated in dark and light green, 
respectively. Hydrophobic, positively, and negatively charged residues are 
marked in black, blue, and red, respectively. c, d, Southern blot analysis to 
identify ste2A (c) and fmk1A ste2A (d) deletion mutants. Genomic DNA of 
wild type and the indicated transformants was treated with EcoRI, separated 
on a 0.7% agarose gel, transferred to a nylon membrane and hybridized 
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with a DNA probe corresponding to the 5’ flanking sequence of the ste2 

gene. Transformants #1 and #8 in c and #4, #5 and #9 in d show a banding 
pattern consistent with targeted deletion of the ste2 gene by homologous 
integration of a single construct. e, Loss of Ste2 negatively affects virulence of 
E oxysporum on tomato seedlings. Surface-sterilized tomato seeds (cultivar 
Monika) were germinated in glass tubes with 4 ml 0.5% water agar containing 
2.5x 10°ml ! microconidia of the indicated F. oxysporum strains and 
incubated at 28 °C under a daily cycle of 15h light and 9h dark. Plant survival 
was recorded for 32 days. Plants inoculated with the ste2A mutant showed 
significantly lower mortality than those inoculated with the wild-type and the 
complemented strain (P= 0.02, log-rank test). n= 40 plants. Results shown are 
from one representative experiment. Experiments were performed twice with 
similar results. 
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Extended Data Table 1 | Fusarium oxysporum strains used in this study 


Strain 

FGSC 4287 
4287 GFP 
fmk1A 

fmk1A + frk1 
ste12A 

ste12A + ste12 
motA 

mo1A + thot 
ste2A 

ste2A + ste2 
fmk1A ste2A 
mpk1A 

mpk1A + mpk1 
fmk1A mpk1A 
bck1A 

bck1A + bck 
mkk2A 

mkk2A + mkk2 
ste11A 

ste11A + ste11 
ste7A 

ste7 + ste7A 


Ref. 29 is cited in this Table. 


Gen otype 

wild type 

PgpdA-GFP; H YG 
fmk1::PHLEO 
fmk1::PHLEO; fmk1::HYG 
s@12::HYG 

ste12::HYG; ste12::PHLEO 
mo1::HYG 

mho1::HYG; mo1::PHLEO 
st2::HYG 

St@2::HYG; s2::;PHLEO 
fmk1::PHLEO; ste2::HYG 
mpk1::HYG 

mpk1::HYG; mpk1::PHLEO 
fmk1::PHLEO; mpk1:HYG 
bck1::HYG 

bek1::HYG; bck1::PHLEO 
mkk2::HYG 

mkk2::HYG ;mkk2::PHLEO 
sie11::HYG 

s11::HYG; ste11::PHLEO 
SbB7::HYG 

Ste7::HYG; s7:;PHLEO 


Gene function 


Green F luorescent Protein 
MAPK 
MAPK 


Homeodomain transcription factor 


Homeodomain transcription factar 


Rho-type GTPase 
Rho-+ype GTPase 
GPCR 

GPCR 
MAPK/GPCR 
MAPK 

MAPK 

MAPK / MAPK 
MAPKKK 
MAPKKK 

MAPKK 

MAPKK 

MAPKKK 
MAPKKK 

MAPKK 

MAPKK 
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Reference 
(ref. 16) 
(ref. 16) 
(ref. 16) 
(ref. 16) 
(ref. 29) 
(ref. 29) 
(ref. 19) 
(ref. 19) 
This study 
This study 
This study 
This study 
This study 
This study 
This study 
This study 
This study 
This study 
This study 
This study 
This study 
This study 


LETTER 
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Extended Data Table 2 | Oligonucleotides used in this study 


Primer Sequence 


gpdA15B 
troter8B 
Hyg-G 
Hyg-Y 


CGAGACCTAATACAGCCCCT 
GGATCCAAACAAGTGTACCTGTGCATTC 
CGTTGCAAGACCTGCCTGAA 
GGATGCCTCCGCTCGAAGTA 


hygB 
cassette 


Ste2PFO 
Ste2PFN 
Ste2PFN2 
Ste2PR 
Ste2PRGPDA15B 
Ste2TR 

Ste2TRN 
Ste2TFTrpter8B 
Ste2PFO2 


GCAGGCACAAAGAACAGCAAT 
GTGGCAGAGGAGAGAGCTATAG 
ATTACACCAGCAGTGTTTGCC 
TAAAGATTGGAAGTGAAAGGGG 


TGGTCGTTGTAGGGGCTGTATTAGGTCTCG(A)TAAAGATT GGAAGT GAAAGGGG* 


TCAACATCAACAAGCGAAAGAG 
AACTTAGGGGCTCTGAGGATG 


TTTACCCAGAATGCACAGGTACACTTGTTT(A)GACCAAAACAAAACTTCTAGCG* 
ACCTGGATACACGAACGATAC 


ste2 knockout/ 
complementation 


Mpk1TF1 
Mpk1TR1 
Mpk1TR2 
Mpk1PF1 
Mpk1PF2 
Mpk1PR1 
Mpk1PFO 
Mpk1-R 
STE11 PF 
STE11 PFN 


STE11 PR 


STE11 TF 
STE11 TRN 
STE11 TR 
STE11 GR 
STE7 PF 
STE7 PFN 
STE7 PR 
STE7 TF 
STE7 TRN 
STE7 TR 
STE7 GR 
BCK1 PF 
BCK1 PFN 
BCK1 PR 


BCK1 TF 
BCK1 TRN 
BCK1 TR 
BCK1 GR 


TGGTCGTTGTAGGGGCTGTATTAGGTCTCG(A)CAGTATTTCCCTTCAGCCAAC* 
TCAAGACCAATGTACCTACGG 

CTTTTGGACGAACTGTGAACC 

TGGAGAAGAGTAAATGGACGG 

AGGGAAACGAGGTAGGTTACA 
TTTACCCAGAATGCACAGGTACACTTGTTT(A)CTGGTGATGTGGCTGATTTGT* 
TCCACAGACTACAGAAGAACG 

TCTCCTAGAGGCATCCAGTCC 

TAGGTGATTAGACGTGGGAAG 


GGTCTAGGCTCACTTTGTTTC 
TTTACCCAGAATGCACAGGTACACTTGTTT(A)CATTGTGGGCT GAGAAGGAAC* 


TGGTCGTTGTAGGGGCTGTATTAGGTCTCG(A)TTGACCACAACCTACGACCTA* 
CTTCTGATATGCCGATGGAAC 

CATCAGTCCTTCTCAATCCAG 

ATGTTCAGGGATTGTAGGAGC 

CCTGAGCCGAGTATGGAATTG 

GTCCCCTTATGGCGAATGAAT 
TTTACCCAGAATGCACAGGTACACTTGTTT(A)GATAGAATTGACAAGCTCGCC* 
TGGTCGTTGTAGGGGCTGTATTAGGTCTCG(A)TACCGTCTTCAAATCCCAAGG* 
CAGTGGCTTCGTTAATCAGTC 

AATGGAAGAGAGTGGAAGAGG 

TAAATGATCTTCAGGGTTAGGC 

GAGACTTTTGGAATGGAGAGG 

GGTAGATTGAGTTACGTCTGG 
TTTACCCAGAATGCACAGGTACACTTGTTT(A)TCTTGAGGCT GAGATTGAGAC* 


TGGTCGTTGTAGGGGCTGTATTAGGTCTCG(A)GTCTGGGTTGTGTAGTCCTG* 


TGGTGATGTTCGTCAAGAGATA 
GTTTCCTTGTTGCCTCGATCT 
ATCCAGAATACCGAACCTTGC 


mpk1 knockout/ 
complementation 


ste11 knockout/ 
complementation 


ste7 knockout/ 
complementation 


bck1 knockout/ 
complementation 


MKK2 PF 
MKK2 PFN 
MKK2 PR 


MKkK2 TF 
MKK2 TRN 
MKK2 TR 
MKK2 GR 
Tap2_for1 
Tap1/2_rev 


TAGCTTTGGATTGCGGTTGGA 

ACGAGAATGACGATGTGTGTG 
TTTACCCAGAATGCACAGGTACACTTGTTT(A)GGGTAGTGGAGTT GAATCAGA* 
TGGTCGTTGTAGGGGCTGTATTAGGTCTCG(A)TTTCTTTTGTCTGGGGTT GGG* 
CCGAGAATAGCATCTTCAGAC 

CAGATTGCTCGTTTCCTCAAG 

GCTCGTCTGTTGGGTTGTTTT 

CTCTGTTTCTTCCAAATAGAC 

CTCGAGTCACATAGAAGCCACAGAAG 


mkk2 knockout/ 
complementation 


tmp2 cloning 


Tmp2FPCR_for 
Tmp2FPCR_rev 


Cevi1_for1 
Cevi1_rev 


*The sequence shown in italics corresponds to the complementary region of the godA15B (Ste2PRGPDA15B, Mpk1TF1, STEL1TF, STE7TF, BCK1TF and MKK2TF) or trpter8B (Ste2TFTrpter8B, Mpk1PR1, 


ATTAGTCTACATTTCGAGGACTGC 
GTCCTCGAAATGTAGACTAATGAG 
CATATGCAATTAAGTGCAACATTTTACG 
CTCGAGCTAATCAACTAATTAACCCTCT 


STE11PR, STE7PR, BCK1PR and MKK2PR) primers. 
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Epithelial-to-mesenchymal transition 
is dispensable for metastasis but induces 
chemoresistance in pancreatic cancer 


Xiaofeng Zheng", Julienne L. Carstens!*, Jiha Kim!, Matthew Scheible', Judith Kaye!, Hikaru Sugimoto!, Chia~Chin Wv’, 


Valerie S. LeBleu! & Raghu Kalluri!*+ 


Diagnosis of pancreatic ductal adenocarcinoma (PDAC) is 
associated with a dismal prognosis despite current best therapies; 
therefore new treatment strategies are urgently required. Numerous 
studies have suggested that epithelial-to-mesenchymal transition 
(EMT) contributes to early-stage dissemination of cancer cells and 
is pivotal for invasion and metastasis of PDAC’*. EMT is associated 
with phenotypic conversion of epithelial cells into mesenchymal- 
like cells in cell culture conditions, although such defined 
mesenchymal conversion (with spindle-shaped morphology) of 
epithelial cells in vivo is rare, with quasi-mesenchymal phenotypes 
occasionally observed in the tumour (partial EMT)*°. Most studies 
exploring the functional role of EMT in tumours have depended 
on cell-culture-induced loss-of-function and gain-of-function 
experiments involving EMT-inducing transcription factors such 
as Twist, Snail and Zeb1 (refs 2,3,7-10). Therefore, the functional 
contribution of EMT to invasion and metastasis remains unclear*®, 
and genetically engineered mouse models to address a causal 
connection are lacking. Here we functionally probe the role of 
EMT in PDAC by generating mouse models of PDAC with deletion 
of Snail or Twist, two key transcription factors responsible for 
EMT. EMT suppression in the primary tumour does not alter the 
emergence of invasive PDAC, systemic dissemination or metastasis. 
Suppression of EMT leads to an increase in cancer cell proliferation 
with enhanced expression of nucleoside transporters in tumours, 
contributing to enhanced sensitivity to gemcitabine treatment 
and increased overall survival of mice. Collectively, our study 
suggests that Snail- or Twist-induced EMT is not rate-limiting 
for invasion and metastasis, but highlights the importance of 
combining EMT inhibition with chemotherapy for the treatment 
of pancreatic cancer. 

We crossed Twist 1!*?°*? (Twist1") or Snai1!*?/°*? (Snail!) 
mice with Pdx1-cre;LSL-Kras©!7;P53®!”24/+ (KPC) to generate 
the Pdx1-cre;LSL-Kras@'7); P538!774+, Twist1/" (KPC;Twist“®°) 
and the Pdx1-cre;LSL-Kras@!7; P538!774+, Snai 1!" (KPC;Snail“*°) 
mice, respectively. The resultant progeny were born in an expected 
Mendelian ratio, without overt phenotypic findings other than the 
anticipated emergence of spontaneous pancreatic cancer (Extended 
Data Fig. 1a). Genetic deletion of Snail or Twist1 did not significantly 
delay pancreatic tumorigenesis, alter tumour histopathology features 
or local invasion (Fig. la-~c and Extended Data Table 1). KPC;Twist"*° 
and KPC;Snail*° mice displayed similar tumour burden compared 
to KPC control mice (Extended Data Fig. 1b) and insignificant differ- 
ences in overall survival (Fig. 1d). Loss of Twist1 or Snail expression 
in the pancreas epithelium was confirmed by in situ hybridization 
coupled with CK8 epithelial immunolabelling (Fig. le and Extended 


Data Fig. 1c) as well as immunolabelling for Twist and Snail (Extended 
Data Fig. 1d). Significant suppression of EMT was noted (Fig. 1f, g 
and Extended Data Fig. le, f). Lineage tracing (Fig. 1f and Extended 
Data Fig. le) and immunolabelling of the primary tumour (Fig. 1g) 
showed a significant decrease in the frequency of epithelial cells with 
expression of the mesenchymal marker aSMA (EMT* cells) and 
a decrease in expression of the EMT-inducing transcription factor 
Zeb1 (Fig. 1h). Global gene expression profiling of tumours revealed a 
decrease in expression of EMT-associated genes (including Snail and 
Twist1) in KPC;Snail®° and KPC;Twist*© mice compared to KPC 
control (Extended Data Fig. 1f). Loss of Snail and Twist enhanced 
E-cadherin expression and suppressed Zeb2 and Sox4 expression 
in cancer cells (Extended Data Fig. 2a—c). Snai2 (Slug) expression 
was restricted to early pancreatic intraepithelial neoplasia (PanIN) 
lesions in all the experimental groups with no observed expression 
in advanced tumours and was significantly reduced in KPC;Snail*° 
and KPC;Twist**° mice compared to KPC control mice (Extended 
Data Fig. 2d). 

While desmoplasia, including extracellular matrix (ECM) and 
myofibroblasts content (Fig. li and Extended Data Fig. 2e, f), tumour 
vessel density (Extended Data Fig. 2g), intratumoural hypoxia 
(Extended Data Fig. 2h), CD3* T-cell infiltration (Extended Data 
Fig. 2i), and cancer cell apoptosis was unaffected with Twist/Snail dele- 
tion in KPC tumours (Fig. 2a), the proliferation of cancer cells in mice 
with suppressed EMT was significantly increased (Fig. 2b), as shown 
previously in mouse models of breast cancers!!~!*. Immunostaining 
experiments further revealed that EMT™ cancer cells are largely 
Ki67~ (Extended Data Fig. 3a). Altogether, these data suggest that 
EMT driven by Twist/Snail transcription factors is dispensable for 
initiation and progression of primary pancreatic cancer. 

Next, we investigated whether suppression of EMT impacts 
invasion and metastasis. The number of YFP* circulating tumour 
cells from lineage-traced KPC and KPC; Twist°*° was found to be 
unchanged (Fig. 2c and Extended Data Fig. 3b), and expression 
of cancer-cell-specific Kras@!2P mRNA in the blood from KPC, 
KPC;Twist**° and KPC;Snail‘*° mice was unaffected (Fig. 2d), sug- 
gesting that suppression of EMT in pancreatic tumours does not 
impact the rate of systemic dissemination of cancer cells. Extensive his- 
topathological analyses, coupled with CK19 or YFP immunostaining of 
distant metastatic target organs, namely the liver, lung and spleen, indi- 
cated a similar frequency of metastasis in EMT-suppressed tumours 
when compared to control tumours (Fig. 2e, Extended Data Fig. 3c 
and Extended Data Tables 1 and 2). The metastases were negative for 
Twist, Snail, Zeb1 and aSMA, with the exception of a few KPC meta- 
static cells that expressed aSMA or Zeb1 (Extended Data Fig. 3d-f), 


1Department of Cancer Biology, Metastasis Research Center, University of Texas MD Anderson Cancer Center, Houston, Texas 77054, USA. Department of Genomic Medicine, University of Texas 
MD Anderson Cancer Center, Houston, Texas 77054, USA. 3Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA. “Department of Bioengineering, 


Rice University, Houston, Texas 77030, USA. 
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Figure 1 | EMT inhibition does not alter primary tumour progression. 
a, Representative haematoxylin and eosin (H&E)-stained primary tumours 
(scale bar, 100 um). b, Relative percentages of each primary tumour 
histological tissue phenotype. n = 31 (KPC), 14 (KPC;Twist*®°) and 

30 (KPC;Snail°®°) mice; error bars represent s.d. c, Local invasiveness 
n=31 (KPC), 14 (KPC;Twist**°) and 30 (KPC;Snail‘®°) mice; error bars 
represent s.d. d, Overall survival n = 29 (KPC), 12 (KPC;Twist*®°) and 

33 (KPC;Snail*°) mice. e, Twist1 or Snail in situ hybridization (black) with 
CK8 (red) immunolabelling in primary tumours (n = 3 mice for all groups; 
scale bar, 50 um). Relative percentages of Twist1*CK8* or Snail*CK8* 
double-positive cells are shown below (two-tailed t-test). 


while being positive for E-cadherin and Ki-67 (Extended Data Fig. 3g, h). 
The proliferation rate of cancer cells in the metastases was simi- 


lar in KPC, KPC;Snail°*° and KPC;Twist“° mice (Extended Data 


526 | NATURE | VOL 527 | 26 NOVEMBER 2015 


Relative ECM deposition 
(MTS) 


f, “SMA immunolabelling in YFP lineage-traced primary tumours (n= 3 
mice for both groups; scale bar, 50 j1m; two-tailed t-test). g, aSMA (red), CK8 
(green) and DAPI (blue) immunolabelling in primary tumours; white arrows 
indicate double-positive cells (n = 4 mice for all groups; scale bar, 20,1m). h, 
Zeb1 immunolabelling (n= 5 (KPC), 6 (KPC;Twist**©) and 6 (KPC;Snail®°) 
mice; scale bar, 501m; inset scale bar, 20j1m). i, Masson's trichrome stain 
(MTS) (n=8 (KPC), 7 (KPC;Twist*®°) and 7 (KPC;Snail“*°) mice; scale 

bar, 100m; error bars represent s.d.). Unless otherwise indicated error bars 
represent s.e.m., percentages represent per cent change from control and 
significance was determined by one-way ANOVA. *P < 0.05, **P< 0.01, 

*** P< 0.001, ****P < 0.0001; NS, not significant. 


Fig. 3h). Collectively, the results indicated that the deletion of Twist1 
or Snail in genetically engineered mouse models of PDAC did not 
reduce metastatic disease. 
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KPC KPC;Twist**°  KPC;SnaileK° 
Liver metastasis 11/31 6/14 13/30 
Lung metastasis 11/31 4/14 9/30 
Spleen invasion 2/30 2/14 5/29 
Any metastasis 17/31 8/14 18/30 


No significant differences 
Figure 2 | EMT inhibition does not alter invasion and metastasis. 
a, b, Primary tumour immunolabelling for cleaved caspase-3 (a; n= 6 mice 
for all groups; scale bar, 50 um) and Ki67 (b; n=7 (KPC), 7 (KPC;Twist*®°) 
and 9 (KPC;Snail**°) mice; scale bar, 100 um). c, Percentage of YFP* 
circulating tumour cells (CTCs) (1 = 8 mice for both groups; two-tailed 
t-test; error bars represent s.d.). d, Kras©!” expression in whole blood 
cell pellets (n= 5 (KPC), 3 (KPC;Twist*°) and 5 (KPC;Snail*®°) mice; 
error bars represent s.d.). e, Haematoxylin and eosin staining and CK19 
immunolabelling of metastatic liver nodules. Metastatic tumour nodules (T) 
outlined by a dotted line (scale bar, 100 um). A table presenting the number 
of positive tissues out of total tissues examined is shown below (x? analysis). 
f, Expression analysis of Twist] and Snail in cultured primary tumour 
cell lines (n = 4 (KPC) and 5 (KPC;Twist“®°) individual cell lines (Twist1) 


To evaluate whether cancer cells from the pancreas with and with- 
out EMT program differentially benefited from impaired prolifera- 
tion to form secondary tumours, we isolated cancer cells from KPC, 
KPC; Twist*° and KPC;Snail®° mice to assay their organ colonization 
potential. Twist1 was significantly reduced and Snail expression was 
undetectable in cancer cells isolated from Twist- and Snail-deleted 
tumours, respectively (Fig. 2f). Short-term potential to form tumour 
spheres (associated with putative cancer stem phenotype) appeared 
similar in Twist‘®° and Snail“° KPC cells when compared to control 
KPC cells (Fig. 2g)>*!4"!®, Lung colonization frequencies following 
iv. injection of KPC cancer cells (Twist- or Snail-deleted) were similar 
to the control KPC cancer cells (Fig. 2h). These results suggest that a 
favoured epithelial phenotype of cancer cells (via suppression of EMT) 
did not impact the capacity to form tumour spheres or their ability for 
organ colonization”. 

Cancer cell EMT is associated with gemcitabine drug resistance in 
PDAC patients and in the orthotopic mouse models of PDAC1?*!8-73, 
Moreover, enhanced frequency of EMT* cancer cells in pancreatic 
tumours is associated with poor survival?*”°. To determine whether 
EMT suppression enhances PDAC sensitivity to gemcitabine chemo- 
therapy, we tested the gemcitabine sensitivity of cancer cells with 
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KPC KPC;Twist*° 


KPC;SnailcK° 


KPC KPC;Twist°X° KPC;Snail*° 
Lung colonization 3/5 4/5 9/11 4/4 4/4 5/5 5/5 
Liver colonization 1/5 0/5 2/11 0/4 0/4 1/5 1/5 
Spleen colonization 0/5 0/5 0/11 0/4 0/4 0/6 O/5 


No significant differences 
or 4 (KPC) and 6 (KPC;Snail*°) individual cell lines (Snai1); one-tailed 
t-test of AC, error bars represent s.d.). g, Bright-field or YFP images and 
quantification of sphere number in cultured tumour cell lines (n = 3 (KPC), 
2 (KPC;Twist*°) and 3 (KPC;Snail**°) individual cell lines; scale bar, 
50m). h, Haematoxylin and eosin images (scale bar, 100 um) of colonized 
lungs from intravenously injected cultured primary tumour cell lines 
KPC (n=5 (cell line 1) and 5 (cell line 2) mice injected) and KPC;Twist°*° 
(n= 11 (cell line 1) and 4 (cell line 2) mice injected) and KPC;Snail*° 
(n=4 (cell line 1), 5 (cell line 2) and 5 (cell line 3) mice injected). A table 
presenting the number of colonized tissues out of total tissues examined is 
shown below (x? analysis). Unless otherwise indicated error bars represent 
s.e.m and significance was determined by one-way ANOVA. * P< 0.05, 
**P< 0.01, ****P < 0,0001; NS, not significant; ND, not detected. 


suppressed EMT in KPC mice. Equilibrative nucleoside transporter 
(ENT1) and concentrating nucleoside transporter (Cnt3) were sig- 
nificantly upregulated in cancer cells lacking Snail and Twist, while 
ENT2 expression was unchanged (Fig. 3a—-c). KPC, KPC;Snail*° 
and KPC;Twist“° mice were treated with gemcitabine and tumour 
burden was monitored by MRI (Extended Data Table 3). Tumour 
progression was suppressed in KPC;Snail° and KPC;Twist*®° mice 
when compared to treated KPC control mice (Fig. 3d). KPC;Snail**° 
and KPC;Twist*° mice treated with gemcitabine showed improved 
histopathology and increased survival (Fig. 3e-g). 

Cancer cells isolated from the tumours of KPC;Snail°®° and 
KPC;Twist*®° mice showed epithelial morphology (Extended 
Data Fig. 4a) and reduced expression of mesenchymal genes 
compared to KPC cancer cell lines (Extended Data Fig. 4b). 
However, in tissue culture conditions (2D culture on plas- 
tic), equilibrative nucleoside transporters (ENT1/ENT2/ 
ENT3) showed similar expression patterns (Extended Data 
Fig. 4b) and expression of concentrating nucleoside transporters 
(Cnt1/Cnt3) was not detected (data not shown). Increased prolifer- 
ation of KPC;Snail°®° and KPC;Twist“*° cancer cells compared to 
KPC control cells (Extended Data Fig. 4c) probably accounted for 
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Figure 3 | EMT inhibition sensitizes tumours to gemcitabine in KPC 
mice. a—c, Primary tumour immunolabelling for (ENT1 (a), ENT2 (b) 
and Cnt3 (c) (n= 6 (KPC), 5 (KPC;Twist“*°) and 4 (KPC;Snail*®°) mice; 
scale bar, 100 um; error bars represent s.e.m., two-tailed t-test). d, MRI 
tumour volumes of KPC plus gemcitabine (+ gem.) (n= 13 mice, 10 died 
before day 19), KPC; TwistSK° + gem. (n= 15 mice, 5 died before day 19) 
and KPC; Snail®*° + gem. (n = 20 mice, 9 died before day 19). One-way 


the increased sensitivity to gemcitabine and erlotinib in this setting 
(Extended Data Fig. 4d). 

Next, we crossed the Snail!’ to the PDAC mouse model, Ptfla 
(P48)-cre;LSL-Kras°'”>; Tgfbr2- (KTC) to generate Ptfla (P48)- 
cre;LSL-Kras@!7); Tgfbr2\ "Snail!" (KTC; Snail“®°). The KTC model 
offers a reliable and penetrant disease progression rate with a consist- 
ent timeline of death due to PDAC. Similar to the KPC;Snail“®° mice, 
KTC;Snail“*° deletion exhibited suppression of EMT but did not affect 
primary tumour histopathology, lifespan, local invasion, desmoplasia 
or frequency of apoptosis (Fig. 4f and Extended Data Figs 5a—e and 
6a). KTC;Snail®° mice presented with significantly reduced Zeb1 
expression in cancer cells but enhanced expression of Cnt3, ENT2 and 
proliferation (Extended Data Fig. 5e). ENT1 expression was unchanged 
in KTC;Snail‘®° mice compared to KTC mice (Extended Data Fig. 6a). 
KTC;Snail“° mice demonstrated enhanced response to gemcitabine 
therapy, with significant normal parenchymal area and reduced tumour 
tissue (Fig. 4a-c). Gemcitabine therapy in KTC;Snail®° mice reduced 
tumour burden (Fig. 4d) and significantly improved overall survival 
(Fig. 4e) when compared to gemcitabine-treated control KTC mice. 
Gemcitabine therapy specifically increased cancer cell apoptosis and 
removed enhanced proliferation observed in EMT-suppressed tumours 
(Fig. 4g and Extended Data Fig. 5e), without impacting the desmo- 
plastic reaction (Extended Data Fig. 6b). Overall, these results suggest 
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ANOVA comparing mean tumour volumes on day 0 and day 19, error 
bars represent s.d. e, Survival on gemcitabine treatment to end point 
(day 21). f, Haematoxylin and eosin-stained primary tumours (scale bar, 
100 um). g, Relative percentages of each histological tissue phenotype of 
end-point mice (n= 3 (KPC + gem.), 9 (KPC; Twist° + gem.) and 

11 (KPC; Snail®®° + gem.) mice; error bars represent s.d.; two-tailed 
t-test). *P < 0.05, **P < 0.01; NS, not significant. 


an enhanced sensitivity of EMT-suppressed cancer cells to gemcitabine. 
Both ENT2 and Cnt3 were upregulated in EMT-suppressed tumours 
(Fig. 4g). These data support a possible mechanistic connection 
between EMT and resistance to chemotherapy in PDAC. 
Collectively, our studies provide a comprehensive functional anal- 
ysis of EMT in PDAC progression and metastasis. Absence of either 
Twist1 or Snail did not alter cancer progression or the capacity for 
local invasion or metastasis to lung and liver in genetically engineered 
mouse models of PDAC. Metastasis occurs despite a significant loss 
of EMT with either the deletion of Snail or Twist, and in both set- 
tings, Zeb1, Sox4, Slug and Zeb2 are also significantly suppressed. 
Nevertheless, it is possible that other EMT-inducing factors may com- 
pensate for the loss of Snail or Twist to induce invasion and metastasis. 
While Pdx1 is expressed during the development of the pancreas (in 
early pancreatic buds and all three major lineages of the pancreas: 
ductal, acinar and (3-islets), its expression is largely repressed in the 
adult exocrine pancreas*®’’. Therefore, deletion of Snail or Twist 
occurs at the embryonic stage and mice are born normal and exhibit 
normal pancreas histology before the onset of cancer. The mice with 
Snail or Twist deletion develop PanIN lesions at the same frequency 
as the control mice. One could argue that suppression of EMT start- 
ing from the inception of cancer could have launched compensatory 
mechanisms to overcome EMT-dependent invasion and metastasis. 
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Figure 4 | EMT inhibition sensitizes tumours to gemcitabine in KTC 
mice. a, b, Haematoxylin and eosin-stained primary tumour (scale bar, 
100 um) and relative percentage of each histological tissue phenotype 

in KTC + gem. (n=5 ) and KTC;Snail**° + gem. (n= 7) mice (error 
bars represent s.d.). c, Local invasiveness (n= 5 (KTC + gem.) and 

7 (KTC;Snail‘° + gem.) mice; error bars represent s.d.). d, Pancreatic 
mass (n= 3 (KTC + gem.) and 4 (KTC;Snail®° + gem.) mice; error 

bars represent s.d.). e, Overall survival of KTC + gem. (n= 8) and 
KTC;Snail**° + gem. (n= 4) mice. f, Overall survival of KTC (n= 6) and 
KTC;Snail*®° (n = 3) mice. g, aSMA (red), CK8 (green) and DAPI (blue) 


However, such compensation is not observed with respect to chemore- 
sistance, and previous studies have demonstrated that EMT and cancer 
cell dissemination are observed even before PDAC lesions are detected 
in KPC mice’. 

Our study demonstrates that EMT results in suppression of 
cancer cell proliferation and suppression of drug transporter and con- 
centrating proteins, therefore inadvertently protecting EMT* cells 
from anti-proliferative drugs such as gemcitabine. The correlation of 
decreased survival of pancreatic cancer patients with increased EMT is 
probably due to their impaired capacity to respond to gemcitabine and 
chemotherapeutics, which is a standard of care for most patients”*”?. 
A compromised response to chemotherapy probably also explains 
higher metastatic disease in association with decreased survival of 
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staining of primary tumours; white arrows indicate double-positive cells 
(n= 4 mice for both groups; scale bar, 20 um), and immunolabelling for 
Zeb1 (n=4(KTC + gem.) and 5 (KTC;Snail*®° + gem.) mice; scale bar, 
50 um; inset scale bar, 20 um), cleaved caspase-3 (n= 4 (KTC + gem.) 

and 5 (KTC;Snail®*° + gem.) mice; scale bar, 501m), Ki67 (n= 4 (KTC + 
gem.) and 5 (KTC;Snail° + gem.) mice; scale bar, 100 um), ENT2 (n=5 
mice for both groups; scale bar, 100j1m), and Cnt3 (n=5 mice for both 
groups; scale bar, 100 1m). Unless otherwise indicated error bars represent 
s.e.m. and significance was determined by two-tailed t-tests. *P < 0.05, 
**P < 0.01, ***P < 0.001; NS, not significant. 


patients with enhanced EMT signatures. Collectively, our study offers 
the opportunity to evaluate the potential of targeting EMT to enhance 
efficacy of chemotherapy and targeted therapies*”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Mice. Characterization of disease progression and genotyping for the Pdx1- 
cre;LSL-Kras@?, P53®!”74/+ (herein referred to as KPC) and Ptfla (P48)-cre;LSL- 
Kras“”>;Tgfbr2 (herein referred to as KTC) mice were previously described*!~*5, 
These mice were bred to Snail!" (herein referred to as Snail**°), Twist1/“ 
(herein referred to as Twist*°), and R26-LSL-EYFP*’. Snail**© mice were kindly 
provided by S. J. Weiss. Twist*X° mice were kindly provided by R. R. Behringer 
via the Mutant Mouse Regional Resource Center (MMRRC) repository. The 
resulting progeny were referred to as KPC, KPC;Snail“*°, KPC;Twist*®°, KTC 
and KTC;Snail‘®° mice and were maintained on a mixed genetic background. 
Both males and females were used indiscriminately. Mice were given gemcitabine 
(G-4177, LC Laboratories) via intraperitoneal injection (i.p.) every other day at 
50mgkg | of body weight. Hypoxyprobe was injected in a subset of mice i.p. 
at 60 mg kg“! of body weight 30 min before euthanasia. For in vivo colonization 
assays, one million KPC, KPC;Twist**° and KPC;Snail“° tumour cells in 100 ul of 
PBS were injected intravenously via the retro-orbital venous sinus. Four to eleven 
mice were injected per cell line. All mice were euthanized at 15 days post injection. 
All mice were housed under standard housing conditions at MD Anderson Cancer 
Center (MDACC) animal facilities, and all animal procedures were reviewed and 
approved by the MDACC Institutional Animal Care and Use Committee. Tumour 
growth met the standard of a diameter less than or equal to 1.5 cm. Investigators 
were not blinded to group allocation but were blinded for the assessment of the 
phenotypic outcome by histological analyses. No statistical methods were used to 
predetermine sample size and the experiments were not randomized. 

Histology and histopathology. Histology, histopathological scoring, Masson's 
trichrome staining (MTS), and Picrosirius Red have been previously described’**?. 
Formalin-fixed tissues were embedded in paraffin and sectioned at 5 ,1m thick- 
ness. MTS was performed using Gomori’s Trichome Stain Kit (38016SS2, Leica 
Biosystems). Picrosirius red staining for collagen was performed using 0.1% picro- 
sirius red (Direct Red80; Sigma) and counterstained with Weigert’s haematoxylin. 
Sections were also stained with haematoxylin and eosin (H&E). Histopathological 
measurements were assessed by scoring H&E-stained tumours for relative per- 
centages of each histopathological phenotype: normal (non-neoplastic), PanIN, 
well-differentiated PDAC, moderately-differentiated PDAC, poorly-differentiated 
PDAC, sarcomatoid carcinoma, or necrosis. When tumour histology was missing 
or of poor quality, the mice were excluded from primary tumour histological anal- 
ysis and this was determined blinded from genotype information. A histological 
invasion score of the tumour cells into the surrounding stroma was scored on a 
scale of 0 to 2, with 0 indicating no invasion and 2 indicating high invasion, where 
invasion is defined as tumour cell dissemination throughout the stroma away from 
clearly defined epithelial ‘nests. Microscopic metastases were observed in H&E- 
stained tissue sections of the liver, lung and spleen. Positivity (one or more lesions 
ina tissue) was confirmed using CK19 and YFP immunohistochemistry. This data 
has been presented as a contingency table (Fig. 2e) and represented as the number 
of positive tissues out of the number of tissues scored. The ‘Any’ metastasis score 
is the number of mice positive for a secondary lesion found anywhere throughout 
the body out of the total number of mice scored. 

Immunohistochemistry and Immunofluorescence. Tissues were fixed in 10% 
formalin overnight, dehydrated, and embedded in paraffin and 5-j1m-thick sections 
were then processed for analyses. Immunohistochemical analysis was performed 
as described**. Heat-mediated antigen retrieval in 1 mM EDTA + 0.05% Tween20 
(pH 8.0) for one hour (pressure cooker) was performed for Snail and Twist, 10 mM 
citrate buffer, pH 6.0, was used for one hour (microwave) for Ki67 or 10 min for 
all other antibodies. Primary antibodies are as follows: SMA (M0851, DAKO, 
1:400 or ab5694, Abcam, 1:400), cleaved caspase-3 (9661, Cell Signaling, 1:200), 
CD3 (A0452, DAKO, 1:200), CD31 (Dia310M, DiaNova, 1:10), CK8 (TROMA-1, 
Developmental Studies Hybridoma Bank, 1:50), CK19 (ab52625, Abcam, 1:100), 
Cnt3 (HPA023311, Sigma-Aldrich, 1:400), ENT1 (LS-B3385, LifeSpan Bio., 1:100), 
E-cadherin (3195S, Cell Signaling, 1:400), ENT2 (ab48595, Abcam, 1:200), Ki67 
(RM-9106, Thermo Scientific, 1:400), Slug (9585, Cell Signaling, 1:200), Snail 
(ab180714, Abcam, 1:100), Sox4 (ab86809, Abcam, 1:200), Twist (ab50581, Abcam, 
1:100), YFP (ab13970, Abcam, 1:1000), Zeb1 (NBP1-05987, Novus, 1:500), and 
Zeb2 (NBP1-82991, Novus, 1:100). Sections for pimonidazole adduct (HPI Inc., 
1:50) or aSMA immunohistochemistry staining were blocked with M.O.M. kit 
(Vector Laboratories, West Grove, PA) and developed by DAB according to the 
manufacturer’s recommendations. Alternatively, for immunofluorescence, sections 
were dual-labelled using secondary antibodies conjugated to Alexa Fluor 488 or 
594 or tyramide signal amplification (TSA, PerkinElmer) conjugated to FITC. 
Lineage-traced (YFP-positive) EMT analysis was performed on 8-j1m-thick O.C.T. 
medium (TissueTek)-embedded frozen sections. Sections were stained for aSMA 
(ab5694, Abcam, 1:400) followed by Alexa Fluor 680 conjugated secondary anti- 
body. Bright-field imagery was obtained on a Leica DM1000 light microscope or 
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the Perkin Elmer 3DHistotech Slide Scanner. Fluorescence imagery was obtained 
on a Zeiss Axio Imager.M2 or the Perkin Elmer Vectra Multispectral imaging 
platform. The images were quantified for per cent positive area using NIH ImageJ 
analysis software (aSMA, Pimonidazole, Slug, and CD31), per cent positive cells 
using InForm analysis software (Ki67 and CD3), or scored for intensity either 
positive or negative (aSMA/CKS8 dual staining, aSMA, CK19, YFP, Zeb1, Zeb2, 
Sox4, E-cadherin and cleaved caspase-3) or on a scale of 1-3 (E-cadherin) or 1-4 
(ENT1, ENT2 and Cnt3). 

In situ hybridization. In situ hybridization (ISH) was performed on frozen 
tumour sections as previously described**. In brief, 10-j1m-thick sections 
were hybridized with antisense probes to Twist1 and Snail overnight at 65°C. 
After hybridization, sections were washed and incubated with AP-conjugated 
sheep anti-DIG antibody (1:2,000; Roche) for 90 min at room temperature. 
After three washes, sections were incubated in BM Purple (Roche) until posi- 
tive staining was seen. Digoxigenin-labelled in situ riboprobes were generated 
with an in vitro transcription method (Promega and Roche) using a PCR tem- 
plate. The following primers were used to generate the template PCR product. 
Twist1, forward, 5’-CGGCCAGGTACATCGACTTC-3’; reverse, 5/-TAATACG 
ACTCACTATAGGGAGATTTAAAAGTGTGCCCCACGC-3’; Snail, forward, 
5'-CAACCGTGCTTTTGCTGAC-3’; reverse, 5’-TAATACGACTCACTATAGG 
GAGACCTTTAAAATGTAAACATCTTTCTCC-3’. 

Gene expression profiling. Total RNA was isolated from tumours of KPC 
control, KPC;Twist"*° and KPC;Snail“®° mice (n= 3 in each group) by TRIzol 
(15596026, Life Technologies) and submitted to the Microarray Core Facility at 
MD Anderson Cancer Center. Gene expression analysis was performed using 
MouseWG-6 v2.0 Gene Expression BeadChip (Illumina). The Limma package 
from R Bioconductor® was used for quantile normalization of expression arrays 
and to analyse differentially expressed genes between cKO and control sample 
groups. Gene expression microarray data have been deposited in GEO (Accession 
number GSE66981). Genes upregulated in cells acquiring an EMT program were 
expected to be downregulated in the Twist*° and Snail“° tumours compared to 
control tumours. 

CTC assays. Blood (200,11) was collected from KPC;LSL-YFP and 
KPC;Twist**°;LSL-YFP (ROSA-LSL-YEP lineage tracing of cancer cells) mice and 
incubated with 10 ml of ACK lysis buffer (A1049201, Gibco) at room temperature 
to lyse red blood cells. Cell pellets were resuspended in 2% FBS containing PBS and 
analysed for the number of YFP* cells by flow cytometry (BD LSRFortessa X-20 
Cell Analyzer). The data was expressed as the percentage of YFP* cells from gated 
cells, with 100,000 cells analysed at the time of acquisition. Whole blood cell pellets 
were also assayed for the expression of Kras®!”° transcripts, using quantitative 
real-time PCR analyses (described below). 

Primary pancreatic adenocarcinoma cell culture and analyses. Derivation of 
primary PDAC cell lines were performed as previously described*®, Fresh tumours 
were minced with sterile razor blades, digested with dispase II (17105041, Gibco, 
4mg ml !)/collagenase IV (17104019, Gibco, 4mg ml~ 1)/RPMI for 1 h at 37°C, 
filtered by a 701m cell strainer, resuspended in RPMI/20%FBS and then seeded on 
collagen I-coated plates (087747, Fisher Scientific). Cells were maintained in RPMI 
medium with 20% FBS and 1% penicillin, streptomycin and amphotericin B (PSA) 
antibiotic mixture. Cancer cells were further purified by FACS based on YFP or 
E-cadherin expression (anti-E-cadherin antibody, 50-3249-82, eBioscience, 1:100). 
The sorted cells, using BD FACSAria™ II sorter (South Campus Flow Cytometry 
Core Lab of MD Anderson Cancer Center) were subsequently expanded in vitro. 
All studies were performed on cells cultivated less than 30 passages. As these are 
primary cell lines, no further authentication methods were applicable and no myco- 
plasma tests were performed. 

MTT and drug sensitivity assays. MTT assay was performed to detect cell prolif- 
eration and viability by using Thiazolyl Blue Tetrazolium Bromide (MTT, M2128, 
Sigma) following the manufacturer’s recommendations with an incubation of two 
hours at 37°C. For the drug treatment studies, a cell line derived from each of the 
KPC, KPC;Snail*° and KPC;Twist*° mice was treated with 201M gemcitabine 
(G-4177, LC Laboratories) or 100|1M erlotinib (5083S, NEB) for 48h. The relative 
cell viability was detected using MTT assay with a cell line derived from each of 
the KPC, KPC;Snail“®° and KPC;Twist® mice. n is defined as the number of 
biological replicates of a single cell line. Control conditions included 1% DMSO 
vehicle for erlotinib. The relative absorbance was normalized and control (time 
Oh or vehicle-treated) arbitrarily set to 1 or 100% for absorbance or drug survival, 
respectively. 

Quantitative real-time PCR analyses (qPCR). RNA was extracted from 
whole blood cell pellets following ACK lysis using the PicoPure Extraction 
kit as directed (KIT0214, Arcturus), or from cultured primary pancreatic 
adenocarcinoma cells using TRIzol (15596026, Life Technologies). cDNA 
was synthetized using TaqMan Reverse Transcription Reagents (N8080234, 
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Applied Biosystems) or High Capacity cDNA Reverse Transcription Kit 
(4368814, Applied Biosystems). Primers for Kras®!?? recombination are: 
Kras°!??, forward , 5’-ACTTGTGGTGGTTGGAGCAGC-3’; reverse, 
5'-TAGGGTCATACTCATCCACAA-3’. 1/AC, values are presented to show 
Kras©' expression in indicated experimental groups, statistical analyses 
were performed on AC,. Primer sequences for EMT-related genes are listed in 
Supplementary Table 1, GAPDH was used as an internal control. The data are pre- 
sented as the relative fold change and statistical analyses were performed on AC. 
Tumour sphere assay. Tumour sphere assays were performed as previously 
described**. Two million cultured primary tumour cells were plated in a low- 
adherence 100-mm dish (FB0875713, Fisherbrand) with 1% FBS, Dulbecco's 
modified Eagle's medium, and penicillin/streptomycin/amphotericin. Cells were 
incubated for 7 days and formed spheres were counted at 100 magnification. 
Three, two and three cell lines were analysed for KPC control, KPC;Twist“*° and 
KPC;Snail**° groups, respectively, five field of views per cell line were quantified. 
MRI analyses. MRI imaging was performed using a 7T small animal MR sys- 
tem as previously described*’. To measure tumour volume, suspected regions 
were drawn blinded on each slice based on normalized intensities. The volume 
was calculated by the addition of delineated regions of interest in mm? x 1mm 
slice distance. None of the mice had a tumour burden that exceeded 1.5cm in 
diameter, in accordance with institutional regulations. All mice with measurable 
tumours were enrolled in the study (see Extended Data Table 3). Mice were imaged 
twice, once at the beginning of the enrolment (day 0), and a second time 20 days 
(day 19) afterwards. Surviving animals were euthanized at end point (day 21) for 
histological characterization. 

Statistical analyses. Statistical analyses were performed on the mean values of 
biological replicates in each group using unpaired two-tailed or one-tailed t-tests 
(qPCR only), or one-way ANOVA with Tukey’s multiple comparisons test using 


GraphPad Prism, as stipulated in the figure legends. \* analyses, using SPSS sta- 
tistical software, were performed comparing control to cKO groups for metastatic 
or colonization frequency across multiple histological parameters in all mice and 
mice >120 days of age in Extended Data Table 1. Fisher’s exact P value was used 
to determine significance. Results are outlined in Extended Data Table 2. Kaplan- 
Meier plots were drawn for survival analysis and the log rank Mantel-Cox test 
was used to evaluate statistical differences, using GraphPad Prism. Data met the 
assumptions of each statistical test, where variance was not equal (determined by an 
F-test) Welch's correction for unequal variances was applied. Error bars represent 
s.e.m. when multiple visual fields were averaged to produce a single value for each 
animal which was then averaged again to represent the mean bar for the group in 
each graph. P< 0.05 was considered statistically significant. 
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Extended Data Figure 1 | EMT inhibition is specific to tumour in KPC and KPC;Twist*° or KPC;Snail*° tumours, respectively. Black 
epithelium. a, Representative images of haematoxylin and eosin-stained arrows highlight positive cells in the stroma, red arrows highlight negative 
small intestine (SmInt), kidney, and heart (scale bar, 100j1m). b, Pancreatic —_ epithelium (scale bar, 20,1m). e, Channel separations of the representative 
mass of 29 (KPC), 13 (KPC;Twist*°) and 28 (KPC;Snail®°) mice, error images of ~SMA immunolabelling in YFP lineage-traced tumours found 
bars represent s.d.; one-way ANOVA. c, Merge of Twist1 or Snail in situ in Fig. 1f (scale bar, 50,1m). f, EMT gene expression signature analysis in 
hybridization (black) followed by CK8 (red) immunolabelling in tumours KPC, KPC;Twist**° and KPC;Snail“° cohorts (n = 3 mice). Red arrows 
from KPC and KPC;Twist**° or KPC;Snail“*° mice, respectively. White indicate reduced Twist1 and Snail expression in KPC;Twist®° and 
arrows highlight positive cells in the stroma, yellow arrows highlight KPC;Snail*®° cohorts, respectively. 


negative epithelium (scale bar, 20 um). d, Twist or Snail immunostaining 
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Extended Data Figure 2 | General suppression of EMT markers (scale bar, 100,1m; error bars represent s.d.) f, “SMA immunolabelling and 
does not affect desmoplasia. a, E-cadherin immunolabelling and quantification of primary KPC (n=5 mice), KPC;Twist*®° (n= 5 mice) and 
quantification of primary KPC (n=5 mice), KPC;Twist“®° (n= 5 mice) KPC;Snail**° (n= 5 mice) (scale bar, 100,1m). g, CD31 immunolabelling 


and KPC;Snail‘®° (1 = 4 mice) (scale bar, 100 um). b, Zeb2 immunolabelling —_ and quantification of primary KPC (n= 4 mice), KPC;Twist*®° (n= 4 
and quantification of primary KPC (n= 6 mice), KPC;Twist*° (n= 5 mice) mice) and KPC;Snail*° (n = 3 mice) (scale bar, 200 1m, inset scale bar, 


and KPC;Snail*° (1 = 7 mice) (scale bar, 50 um; inset scale bar, 20 um). 100\1m). h, Pimonidazole staining and quantification of primary KPC (n= 4 
c, Sox4 immunolabelling and quantification of primary KPC (n =7 mice), mice), KPC;Twist*®° (n = 4 mice) and KPC;Snail“*° (n= 4 mice) (scale 
KPC;Twist*° (n= 6 mice) and KPC;Snail®° (n= 8 mice) (scale bar, bar, 100j1m). i, CD3 immunolabelling and quantification of primary KPC 

50 um; inset scale bar, 20 um). d, Slug immunolabelling and quantification (n=5 mice), KPC;Twist*° (n= 5 mice) and KPC;Snail“*° (n= 5 mice) 

of primary KPC (n= 4 mice), KPC;Twist**° (n = 4 mice) and KPC; (scale bar, 100 1m; inset scale bar, 251m). Unless otherwise indicated error 
Snail‘®° (n = 4 mice) tumours (scale bar, 501m; inset scale bar, 20|1m). bars represent s.e.m., and significance determined by one-way ANOVA. 

e, Sirius Red staining and quantification of primary KPC (n = 21 mice), *P<0.05, **P< 0.01, ***P < 0.001; ns, not significant. 


KPC;Twist*° (n= 8 mice) and KPC;Snail**° (n = 11 mice) 
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Extended Data Figure 3 | EMT suppression does not alter epithelial f, SMA immunolabelling and quantification of metastatic KPC (n= 3 
characteristics of metastases. a, Immunolabelling of primary tumours mice), KPC;Twist“*° (n= 3 mice) and KPC;Snail®° (1 = 3 mice) (scale bar, 
(n= 3 mice) for aSMA (red), CK8 (green), Ki67 (white) and DAPI (blue); 50 um; inset scale bar, 201m). g, E-cadherin staining on serial sections of 
yellow arrows indicate EMT* cells (scale bar, 20 um). b, Representative oaSMA immunolabelling and quantification of metastatic KPC (n= 4 mice), 
dot plots of circulating YFP* cells. c, Images of serial sections of KPC;Twist°®° (n = 3 mice) and KPC;Snail*®° (n= 4 mice) (scale bar, 50,1m; 
KPC;LSL-YFP lung and liver metastasis stained for haematoxylin and inset scale bar, 20,1m). h, Ki67 immunolabelling and quantification of 
eosin or immunolabelled for CK19 or YFP. Yellow dashed box represents metastatic KPC (n=7 mice), KPC;Twist“*° (n = 3 mice) and KPC;Snail**° 
magnified areas in panel below (scale bar, 200 um; magnification scale (n= 3 mice) (scale bar, 501m; inset scale bar, 20j1m). Unless otherwise 
bar, 50 um). d, KPC metastatic tumours stained for Twist and Snail (n = 3 indicated error bars represent s.e.m., percentages indicated represent per 
mice; scale bar, 20 um; inset scale bar, 10j1m).e, Zeb1 immunolabelling and _ cent decrease from control, and significance was determined by one-way 
quantification of metastatic KPC (n= 4 mice), KPC;Twist*®° (n= 3 mice) ANOVA. *P< 0.05, **P< 0.01, ***P< 0.001; ns, not significant. 


and KPC;Snail“®° (n= 4 mice) (scale bar, 50 jm; inset scale bar, 20|1m). 
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Extended Data Figure 4 | EMT suppressed primary tumour cells have 
reduced mesenchymal markers and show resistance to chemotherapy 
in vitro. a, Bright-field micrograph of cultured primary KPC, 
KPC;Twist"*° and KPC;Snail‘*® cells (scale bar, 50 um). b, EMT- and 
gemcitabine-transport-related gene expression shown by qPCR analysis in 
KPC (n= 3-4 cell lines), KPC;Twist©®° (n= 5 cell lines) and KPC;Snail°*° 
(n= 5-6 cell lines) (error bars represent s.d., one-tailed t-test, *P < 0.05, 


numbers list non-significant P values. nd, not detected, ns, not significant). 
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c, MTT assay showing cell proliferation in KPC, KPC;Twist*° and 
KPC;Snail**® cells (1 = 8, 8 and 8 biological replicates of a cell line for 
each genotype). d, Relative cell viability (MTT assay) in cultured 
KPC, KPC;Twist*° and KPC;Snail*®° cells treated with gemcitabine 
or erlotinib (n = 8, 8 and 8 biological replicates of a cell line for each 
genotype). Unless otherwise indicated error bars represent s.e.m., 
significance was determined by one-way ANOVA. **P< 0.01, 

ED < 0.001, ****P < 0.0001. 
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Extended Data Figure 5 | EMT inhibition in KTC mice mirrors 
phenotype observed in KPC mice. a, Representative images of 
haematoxylin and eosin-stained primary tumours (scale bar, 100 um). 

b, Relative percentage of each histological tissue phenotype of KTC (n= 8 
mice) and KTC;Snail*®° (n= 6 mice) primary tumours (error bars represent 
s.d.). c, Primary tumour invasiveness in KTC (n= 8 mice) and KTC;Snail“*° 
(n= 6 mice) (error bars represent s.d.). d, Pancreatic mass in KTC (n=5 
mice) and KTC;Snail*®° (= 6 mice) (error bars represent s.d.). 
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e, Immunolabelling and quantification of primary KTC (n=5 mice), 
KTC;Snail®° (n= 4 mice) for aSMA (red), CK8 (green) and DAPI (blue); 
white arrows indicate double-positive cells (scale bar, 20 um), Zeb1 (scale 
bar, 50 um; inset scale bar, 20 um), cleaved caspase-3 (scale bar, 50 um; n= 4 
mice for both groups), Ki67 (scale bar, 100j1m), ENT2 (scale bar, 100|1m) 
and CNT3 (scale bar, 1001m); error bars represent s.e.m. Significance was 
determined by two-tailed t-test. *P < 0.05, ***P < 0.001; ns, not significant. 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


KTC; SnailcK° 


8 Avs 


Extended Data Figure 6 | Desmoplasia is unaffected in EMT suppressed 
tumours with or without gemcitabine. a, b, Staining and quantification 
of KTC (n=5 or 6 mice), KTC;Snail**° (n = 4 or 5 mice), KTC plus 
gemcitabine (+ GEM; n= 4 or 5 mice), KTC;Snail“° + GEM (n=5 
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mice) for Masson's trichrome stain (MTS) (scale bars, 100 um), Sirius Red 
staining (scale bars, 100 um), and ENT1 (scale bars, 100 um). Error bars 
represent s.d. (MTS and Sirius Red) or s.e.m. (ENT 1), and significance was 
determined by two-tailed t-test. ns, not significant. 
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Extended Data Table 1 | Pathological spectrum of primary disease and metastasis in KPC, KPC;TwistcKO and KPC;Snail“° cohorts 


Pathological Spectrum within cohorts 


ID AGE PDA Differentiation a eee Liver ai sae ae Moribund 


1 158 Y Ww Ss G Y ae N Y ¥ 
2 165 Y Ww G N N N N Né 
3 148 Y P Ss G N N 7 N Y 
4 135 Y M Ss G Y N Y Y Ms 
5 95 Y M G N Y N ¥ N 
6 42 Y M G N N N N Y 
7 55 Y P G Ss Y N N Y Y 
8 91 Y M G N N N N N 
9 87 Y Ww G N N N N N 
10 63 Y Pe G Y Y Y Y N 
11 108 Y P Ss G Y N N Y FD 
12 110 Y Ww G N N N N N 
13 104 Y Ww G Y N N Ys Y 
14 54 Y Ww Ss G N N N N Y 
15 108 Y P Ss G N Y N Ng ¥. 
16 42 x P Ss G N N N N Nf 
17 68 Y Ww G N N N N N 
18 107 ¥ PR G N N N N N 
19 87 as P G N N N N N 
20 48 a4 P G Ss N N N N Y 
21 109 24 P G Ss Y ¥. N Y FD 
22 81 Ms P G Y ¥. N Y Y 
23 151 nd WwW G N ¥. N Y Y 
24 47 ¥ M G Ss N N N Y Y 
25 143 ¥ P G Ss N N N Y Y 
26 122 ¥ WwW G Y N N Y N 
27 115 ¥ P. G Y Y N Y N 
28 76 ¥ Ww G N Y N Y N 
29 122 ¥: M Ss G Y N N Y Y 
30 97 Y P G N N N N N 
31 107 ¥. WwW. G N N N N N 
Totals (Median) 31/31 11/31 11/31 2/30 17/31 

% 100.0% 35.5% 35.5% 6.7% 54.8% 

1 148 Y Ww G Ss Y N N Me N 
2 151 Y la Ss G Y ¥ Y Y N 
3 140 Y P G Y bf N n 6 ¥ 
4 53 Y P G Ss N N N N Y 
5 43 Y P G N N N N Y 
6 117 Y P G Ss N N N N N 
f 90 Y P Ss G Y N N Y Y 
8 52 Y P G Ss N N N N Y 
9 104 Y P G N N N N N 
10 218 Y RP G Ss N N Y Y Y 
11 153 Y P. G N Y N Y Y 
12 45 Y P G Ss N N N N Y 
13 77 Y P G Ss Y N N ¥ Y 
14 126 Y P G iS} Y Y N vi Y 
Totals (Median) 14/14 6/14 44 2/4 8/14 

% 100.0% 42.9% 28.6% 14.3% 57.1% 

1 144 Y Ww G N ¥ N Me N 
2 51 Y P G Ss N N N N Na 
3 105 Y P G Ss N ¥ N Me Ns 
4 111 Y P G N N N N N 
5 106 Y P G Ss Y N Y Y =v 
6 129 Y P G N N N N N 
7 102 Y P' G Ss N nd - Y N 
8 98 Y P G Ss Y N Y Y N 
9 47 Y P G Ss N N N N Y 
10 54 Y Ww G Y mw N Y FD 
an 59 Y M G Y N N Y N 
12 103 Y P G Y N N Y N 
13 60 Y P Ss G Y N Y Y ¥: 
14 77 Y P G Y N N Y Y 
15 57 Y M Ss G Y N N n4 FD 
16 130 Y P G Y ¥ N Y FD 
17 76 Y a G Ss N N N N FD 
18 111 Y a G N ¥ N Y yy; 
19 100 Y PR G Ss Y N Y Y FD 
20 104 Y P G Ss Y N N Y Né 
21 124 Y M G N N N N FD 
22 88 Y P G Ss N N N N Y 
23 192 Y Ww G Y ¥. N Y h 4 
24 122 Y P G N N N N Y 
25 60 Y Ww G Ss N N N N Y 
26 112 x Ww G N Y N Y N 
27 48 Y P G Ss N N N N Y 
28 48 ¥ P G Ss N N N N Y 
29 124 Y P G Ss Y Y Y Y N 
30 215 Y W. G N N N N N 
Totals (Median) 30/30 13/30 9/30 5/29 18/30 

% 100.0% 43.3% 30.0% 17.2% 60.0% 


Y, yes; N, no; W, well; M, moderate; P, poor; G, glandular; S, sarcomatoid; FD, found dead; -, no tissue. 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 2 | Results of x? analysis of KPC cohorts in Extended Data Table 1 


x2 Analysis 

Grou Parameter Fisher's Exact P value 
Control vs. Twist*K° Early Tumor progression 0.458 

Control vs. Snailek° 0.106 

Control vs. Twist*K° Late Tumor progression 0.458 

Control vs. Snailek° 0.106 

Control vs. TwisteK° Sarcomatoid 0.108 

Control vs. Snailek° 0.446 


Control vs. Twist*° Early Tumor progression 0.580 
Control vs. Snailek° 0.569 
Control vs. Twist*K° Late Tumor progression 0.580 
Control vs. Snailek° 0.569 
Control vs. TwisteK° Sarcomatoid 1.000 
Control vs. Snailek° 0.119 


Control vs. Twistek° Liver Metastasis 0.744 
Control vs. Snaile*° 0.605 
Control vs. Twistek° Lung Metastasis 0.743 
Control vs. Snaile*° 0.786 
Control vs. Twist*K° Spleen Invasion 0.581 
Control vs. Snaile° 0.254 
Control vs. Twistek° Any Metastasis 1.000 
Control vs. Snaile° 0.797 


Control vs. Twistek° Liver Metastasis 0.627 
Control vs. SnailcKO 1.000 
Control vs. Twist*K° Lung Metastasis 0.592 
Control vs. SnailcKO 1.000 
Control vs. Twist°X° Spleen Invasion 0.559 
Control vs. SnailcKO 1.000 
Control vs. Twist*K° Any Metastasis 0.473 
Control vs. SnailcKO 0.608 
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Extended Data Table 3 | Survival and primary tumour burden determined by MRI in KPC, KPC;Twist°“° and KPC;Snail°*° cohorts treated 
with gemcitabine 


KPC Gemcitabine cohorts 


Start Age Start Volume End Volume Survival 
ID (Days) (mm?) (mm?) (Days) 
1 148 1610.4 D 7 
2 72 29.7 D 13 
3 72 439.8 902.8 21* 
4 80 44.1 D 14 
5 100 536.3 592.3 21* 
6 89 167.0 D 2 
7 94 52.7 D 7 
8 122 90.2 D 14 
9 164 217.9 D 8 
10 143 212.8 D 18 
11 84 323.8 897.2 21* 
12 58 76.7 D 4 
13 58 116.2 D 8 
Mean (Median) 301.4 797.4 
Stdev 406.9 145.1 
1 117 243.0 644.2 21* 
2 75 47.2 180.0 21* 
3 75 45.4 460.9 21* 
4 78 54.6 47.5 21* 
5 46 53.7 66.5 21 
6 96 63.1 D 13 
7 90 23.9 D 13 
8 79 101.0 D 14 
9 52 28.5 D 14 
10 52 49.4 98.706 21* 
11 104 43.4 127.0 21* 
12 104 53.5 12.1 21* 
13 68 56.7 D 15 
14 122 650.1 164.1 21* 
15 104 181.8 78.6 21* 
Mean (Median) 113.0 187.9 
Stdev 154.8 193.0 
Smail+GEM (96) ay 
1 188 255.2 D 12 
2 181 854.7 D 4 
3 127 32.0 59.6 21* 
4 127 58.7 107.4 21* 
5 142 109.8 D 14 
6 54 33.6 57.2 21* 
7 89 17.0 D 13 
8 78 54.9 39.6 21* 
9 78 3.1 D 15 
10 104 209.7 134.3 21* 
11 96 220.0 280.2 21* 
12 96 24.1 46.2 21* 
13 119 711.0 D 18 
14 126 655.6 805.4 21* 
15 119 168.6 D 18 
16 82 453.8 517.4 21* 
17 82 56.7 74.1 21* 
18 90 40.0 D 16 
19 67 80.5 D 10 
20 66 49.5 226.2 21* 
Mean (Median) 204.4 213.4 
Stdev 250.7 231.7 


D, died; “euthanized at end point. 
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In situ structures of the segmented genome and RNA 
polymerase complex inside a dsRNA virus 


Xing Zhang!*, Ke Ding??*, Xuekui Yu*, Winston Chang!, Jingchen Sun?* & Z. Hong Zhou!?* 


Viruses in the Reoviridae, like the triple-shelled human rotavirus 
and the single-shelled insect cytoplasmic polyhedrosis virus (CPV), 
all package a genome of segmented double-stranded RNAs (dsRNAs) 
inside the viral capsid and carry out endogenous messenger RNA 
synthesis through a transcriptional enzyme complex (TEC)!. By 
direct electron-counting cryoelectron microscopy and asymmetric 
reconstruction, we have determined the organization of the dsRNA 
genome inside quiescent CPV (q-CPV) and the in situ atomic 
structures of TEC within CPV in both quiescent and transcribing 
(t-CPV) states. We show that the ten segmented dsRNAs in CPV are 
organized with ten TECs in a specific, non-symmetric manner, with 
each dsRNA segment attached directly to a TEC. The TEC consists 
of two extensively interacting subunits: an RNA-dependent RNA 
polymerase (RdRP) and an NTPase VP4. We find that the bracelet 
domain of RdRP undergoes marked conformational change when 
q-CPV is converted to t-CPV, leading to formation of the RNA 
template entry channel and access to the polymerase active site. 
An amino-terminal helix from each of two subunits of the capsid 
shell protein (CSP) interacts with VP4 and RdRP. These findings 
establish the link between sensing of environmental cues by the 
external proteins and activation of endogenous RNA transcription 
by the TEC inside the virus. 

Each capsid of viruses in the Reoviridae contains 9-12 segmented 
dsRNAs and up to 12 TECs. These RNA-containing viruses are fully 
capable of RNA transcribing and capping. Crystal structures of the 
RdRP component of the TEC have been determined for rotavirus and 
mammalian reovirus (MRV)~°, but no high-resolution in situ struc- 
ture of the TEC is available. Moreover, the organization of TECs with 
the dsRNA genome and the mechanism of transcriptional activation 
have remained unresolved, in contrast to the well understood genome 
organization inside dsDNA viruses*”. 

With only a single protein shell that encloses ten different genome 
segments, CPV is one of the simplest dsRNA viruses® and serves as a 
model system, as highlighted by its contribution to the discovery of 
RNA capping’. To gain insight into the organization of the TEC and 
segmented dsRNA genome, we have determined CPV structure in a 
quiescent (q-CPV) state at 5.1 A resolution (see Methods and Extended 
Data Figs 1 and 2). The structure reveals that each CPV contains ten 
TECs under ten specific positions of the twelve icosahedral vertices 
(Fig. 1). The two vertices without TECs are occupied by rod-like den- 
sities (Fig. la—e, Supplementary Video 1 and Extended Data Figs 3 and 
4). The previously ambiguous locations of TECs*” are now determined 
to be ten specific positions in each CPV particle, related by incom- 
plete-D3 symmetry, with only one ona ‘south tropic’ position and three 
each around the ‘north tropic; ‘north pole and ‘south pole’ positions 
(Fig. 1d and Supplementary Video 2). 

Each TEC is surrounded by rod-like densities with lengths up to ~650 A 
(Fig. la—c, e-f, Extended Data Fig. 4a and Supplementary Video 1). 


In most regions, these rods form parallel striations with an inter-rod 
distance of ~27 A, as suggested previously (for example, refs 10, 11). 
Some of the rods exhibit the characteristic minor and major grooves 
typical of dsRNA duplex (Fig. 1g). We therefore interpret these rod-like 
densities as dsRNA duplexes. Unlike the model of each genome segment 
spiralling around one TEC (ref. 12), the duplexes do not spiral locally 
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Figure 1 | Transcription enzyme complex and dsRNA genome 
organization inside CPV. a, Superposition of the high-resolution (3.9 A) 
map of half a capsid (grey) and low-resolution (22 A) map of dsRNA genome 
(radially coloured as in f) and TECs (cyan). b, ¢, Front (b) and back (c) views 
of the dsRNA genome and TECs of panel a. d, Earth-like representation, 
illustrating the locations of the ten TECs (surface-rendered) with pseudo-D3 
symmetry: three on each pole and the northern tropic but only one on the 
southern tropic. e, Cross-sections of the 22 A density map, perpendicular 

to either the ‘earth axis’ in d (top row) or a D3 two-fold axis (bottom row). 
Densities of TECs are numbered as in d, and the two vertices without TEC 
but with RNA are indicated by white arrows. DC, distance from centre. 

f, Boxed region in a containing RNA threads (radially coloured as in the bar) 
and a TEC (cyan) with bound dsRNA (dashed box). g, Averaged TEC region, 
filtered to 4.5 A and viewed as the southern-most TEC of a. The RdRP- 
bound dsRNA has the same structure in all TECs and shows major (yellow 
arrows) and minor (white arrows) grooves. 
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Figure 2 | Averaged TEC map at 3.3 A resolution and de novo modelling 
of VP4. a, Averaged map of the TEC region showing VP4 (cyan) and RdRP 
(purple), both anchored to the inner surface of the capsid (grey). b, Atomic 
model of VP4. c-f, The boxed regions in b, showing density (meshes) 
superposed with atomic models of the GTP-binding site (c), a loop (d), 
a helix (e) and an RdRP-interacting loop (f). 


around TECs (Fig. la—c, Extended Data Fig. 4 and Supplementary 
Video 1); instead, many extend tangentially from one TEC to another 
(for example, duplexes i-iii in Fig. 1b; see also Extended Data Fig. 4), 
indicating that each dsRNA segment is organized beyond one TEC. 
Indeed, the whole RNA genome is organized into seven to eight 
non-concentric layers with visible connections between adjacent lay- 
ers (Fig. le and Extended Data Fig. 3). This extended organization of 
dsRNA is consistent with the rather long (~620 A) persistence length of 
dsRNA? and would reduce the energy needed for genome packaging 
and transcription. One RNA duplex (the brown one in Fig. 1g) binds 
to each of the ten TECs at the same relative position and orientation, 
suggesting that this RNA duplex is a conserved feature among the ten 
dsRNA segments. However, the organization of the remaining RNA 
duplex differs among the ten TECs (Extended Data Fig. 4g). The two 
vertices without TECs are occupied only by roughly parallel dsRNA 
densities (Fig. la-c). 

We also obtained a 3.9A resolution asymmetric reconstruc- 
tion directly from the raw images of q-CPV and subsequently used 
non-crystallographic averaging to improve the resolution to 3.3 A for 
the TEC-containing regions (see Methods and Extended Data Fig. 5). 
The averaged map retains a short (~35 A) RARP-bound dsRNA density 
(Fig. 1g) and resolves the two protein components of the TEC: VP4 
and RdRP (Fig. 2a). We built a backbone model of the RdRP-bound 
dsRNA and de novo atomic models of both VP4 and RdRP (Fig. 2c-f, 
Extended Data Fig. 6 and Supplementary Videos 3-8). VP4 and RdRP 
interact extensively (Fig. 2a) with a buried interface area of ~2,800 A’. 

VP4 appears ‘L-shaped and consists of an amino-terminal (amino 
acids 1-252) and a carboxy-terminal (253-561) domain, with two 
unresolved/flexible segments (amino acids 23-40 and 86-131) (Fig. 2a, 
band Supplementary Video 3). The N-terminal domain is formed by two 
small 8-sheets and several a-helices, and the main body of the C-terminal 
domain is a Walker-A a/B motif, a well-known NTP-binding motif found 
in the P-loop kinase family of proteins. Sequence analysis predicted an 
NTP binding site in VP4 (refs 14, 15). Indeed, the VP4 structure contains 
a GTP molecule at the predicted NTP binding site of the C-terminal 
domain (Fig. 2c and Supplementary Video 4). We thus rename the 
C-terminal domain as the NTPase domain (Fig. 2b, c). A similar fold 
was also observed in the N-terminal a/B domain of bluetongue virus 
VP4. But, remarkably, bluetongue virus VP4 is an RNA capping enzyme 
and its a/B domain does not bind GTP (ref. 16). CPV VP4 and its homo- 
logues in other dsRNA viruses have been speculated to function as an 
NTPase, as an RNA 5/-triphosphatase (RTPase) or as a helicase!*!”"®, 
Our structure supports VP4 as an NTPase but shows no interaction with 
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dsRNA, suggesting that VP4 is unlikely to be a helicase. Whether VP4 is 
the CPV RTPase or an RdRP regulatory factor remains to be determined. 

Like other RdRP structures**”’, the CPV RdRP contains a polymer- 
ase core with finger (amino acids 349-515, 549-641), thumb (730-863) 
and palm (516-548, 642-729) subdomains (Fig. 3a). This polymerase 
core is sandwiched between the N-terminal (1-348) and C-terminal 
bracelet (864-1225) domains (Fig. 3 and Extended Data Fig. 6). A 
GTP is identified (Figs 1g and 3a) at the position equivalent to the 
cap-binding site observed in the MRV RdRP (ref. 2). Interestingly, the 
bracelet domain of q-CPV RdRP differs from that of MRV significantly, 
despite close similarities between both their polymerase core and their 
N-terminal domains. Consequently, the crystal structure of MRV RdRP 
has an open RNA template entry channel and an accessible polymerase 
active site’; while in the q-CPV RdRP, the polymerase active site is 
covered by the bracelet domain and there is no recognizable channel 
for template entry (Figs 3a and 4a and Supplementary Videos 5 and 8). 
Since q-CPV is incapable of mRNA transcription, we considered that 
these structural differences might be characteristic of conformational 
differences between bracelet-containing RdRPs in the quiescent and 
transcribing states. 

To test this hypothesis, we then determined the structure of 
actively transcribing CPV (t-CPV), obtained an averaged TEC map at 
4.0A resolution, and built atomic models of VP4 and RdRP (Fig. 3b, 
Extended Data Figs 5 and Supplementary Video 9). In t-CPV, the loca- 
tion of TECs remains the same, as do the structures of VP4 and those 
of the N-terminal and polymerase core domains of RdRP (Fig. 3a-f, 
Extended Data Figs 7-9 and Supplementary Video 10). By contrast, 
the RdRP bracelet domain undergoes major conformational change 
(Fig. 3d, e). Consistent with the above hypothesis, the in situ structure 
of the t-CPV RdRP is quite similar to the crystal structure of MRV 
RdRP in its elongation state? (Extended Data Fig. 10). 

The most significant changes of the CPV RdRP between quies- 
cent and transcribing states involve two neighbouring structural 
modules in the bracelet domain: the capsid-proximal module A 
(amino acids 1080-1140 containing helices Bal4—Ba16) and the 
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Figure 3 | Comparison of RdRP in quiescent and transcribing states. 

a, b, Ribbon models of RdRP in quiescent (a) and transcribing (b) states. 
The latter contains fragments of RNA template (orange) and nascent mRNA 
(cyan) inside the active site (box). c-f, Superpositions of RdRP structures 
in quiescent (colour) and transcribing (grey) states shown in full (c) and as 
separate domains—N-terminal (d), polymerase (e), and bracelet (f) with 
modules A (yellow) and B (magenta) further highlighted on its right panel. 
g-i, Densities (grey) and models (ribbons and sticks) of nucleic acids in 
the active site of t-CPV RdRP. The fragments of the (—)RNA template and 
the nascent mRNA in the active site are modelled as a poly-G and poly-C, 
respectively. In h, a CTP is placed in the NTP-binding site and in i, the 
template and mRNA form RNA duplex in the active site of RdRP (surface- 
rendered model). 
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Figure 4 | Interactions between TEC and CSPs. a, b, Conformational 
changes of modules A (yellow loops/helices as wires/cylinders) and B 
(magenta loops/helices as wires/cylinders) in quiescent (a) and transcribing 
(b) states. Module A interacts with the capsid shell, and the loop-Ba5 
fragment of module B blocks the active site (inset) in the quiescent state 

(a) but retracts to expose the active site in the transcribing (b) state (see 
Extended Data Fig. 7). c, d, The RdRP-bound dsRNA (ribbon) in the 


VP4-proximal module B (912-1010 containing helices Ba5-Ba9) 
(Fig. 4a, b and Extended Data Figs 9 and 10). Compared to that in 
q-CPV, module A in t-CPV rotates ~40° towards the capsid shell (Fig. 3f 
and Extended Data Figs 9 and 10f-k). Consistent with previous icosa- 
hedral reconstructions, our asymmetric reconstructions show that the 
capsid shell of t-CPV expands outwards from q-CPYV, with the maximal 
(~10 A) expansion occurring at the vertex region”®”!, to which module 
A of the bracelet domain is attached (Fig. 4e, f). Likewise, module B 
refolds substantially from quiescent to transcribing state, such that a 
template entry channel is formed (Fig. 4a, b) and the blockage of the 
active site by the Ba5-loop-Baé fragment is removed (Figs 3f, 4a, b and 
Extended Data Figs 9 and 10). 

In the quiescent state, a helical dsRNA duplex is held inside a shal- 
low cleft formed by modules A and B (Figs 1g, 4c and Extended Data 
Fig. 7d, f) through interaction between a major groove of the RNA 
duplex and residue Arg997 of module B (Fig. 4c, inset). In the tran- 
scribing state, this RNA duplex becomes detached, perhaps as a result 
of refolding the RdRP bracelet domain (Fig. 4d and Extended Data Fig. 
7e, g). We reason that detachment of the RNA duplex would permit 
RNA to slide towards the template entry channel for RNA synthesis in 
t-CPV. Indeed, in the catalytic centre of the t-CPV RdRP, we observe 
weak densities (Fig. 3b, g-i) that match the RNA duplex in the crystal 
structure of the MRV RdRP elongation complex’. We are able to place 
a 5-base-pair (bp) RNA backbone model in the active site and a CTP 
at the NTP binding site (Fig. 3g-i). 

In addition to enclosing the viral genome and anchoring TECs, the 
CSP also regulates polymerase activity in dsRNA viruses**~”». In par- 
ticular, the CSP N-terminal fragment is involved in genome replication, 
mRNA transcription and capping”>”®?”. A CSP N-terminal fragment, 
unresolved in all previous structures”!?*-9, is resolved here to form 
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e, f, Interactions of CSPs (ribbons) with RdRP (purple and yellow) and VP4 
(cyan). Residues of RdRP and VP4 within 4 A distance to the capsid shell are 
marked in red. An icosahedral five-fold axis is indicated by a green line in 

e and a green pentagon in f. Insets in f indicate two CSP N-terminal helices 
(white density with ribbon-and-stick models): one (upper) interacts only 
with RdRP while the other (lower) with both RdRP and VP4. 


a helix in the two TEC-interacting CSP subunits in both q-CPV and 
t-CPV (Fig. 4e, f). The N-terminal helix of one CSP inserts into the 
interface between the NTPase domain of VP4 and the finger subdo- 
main of RdRP (Fig. 4f, lower inset), and that of the other CSP interacts 
with the bracelet domain of RdRP (Fig. 4f, upper inset). Notably, the 
former is in proximity to the NTP-binding site of the VP4 NTPase, 
suggesting how the N-terminal fragment of CSP is positioned to 
affect TEC. In addition, the structures reveal that other regions (that 
is, areas under the vertex) of CSP also interact with module A of the 
RdRP bracelet domain (Fig. 4e, f). From quiescent to transcribing 
state, module A and the CSP regions involved in this interaction both 
undergo conformational changes. Taken together, these results point 
to a sequence of conformational changes that leads to activation of 
endogenous transcription. Specifically, environmental cues cause the 
capsid shell to expand”’, which triggers refolding of the RdRP bracelet 
domain, leading to formation of the entry channel for a RNA template 
and exposure of the polymerase active site for RNA synthesis. 

Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Sample preparation and cryoEM imaging. CPV particles were purified as 
described previously’. Purified polyhedra were treated at pH 10.8 with an alka- 
line solution (0.2 M Na,CO3-NaHCOs) for 1h, and then centrifuged at 10,000g 
for 40 min. The supernatant was collected and centrifuged at 80,000g for 60 min at 
4°C to pellet the CPV virions. The resulting pellet was directly re-suspended in the 
quiescent buffer (70 mM pH 8.0 Tris-Cl, 10 mM MgCh, 100mM NaCl and 2mM 
GTP). To prepare the transcribing CPV (t-CPV) particles, 30 ul purified CPV was 
incubated in a reaction buffer (70 mM Tris, pH 8.0, 10 mM MgCh, 100mM NaCl, 
and 1mM SAM+2mM GTP+2mM UTP+2mM CTP+ 4mM ATP) at 31°C for 
15 min, and then the reaction was stopped by quenching the reaction tubes on ice. 

To prepare cryoEM grids, 2.5 ul of purified CPV sample was applied to a 
Quantifoil grid (2/2), blotted for 15s with an FEI vitrobot in 100% humidity, and 
then plunged into liquid ethane. CryoEM images of the quiescent CPV (q-CPV) 
were collected in an FEI Titan Krios cryo electron microscope, operated at 300kV 
with a nominal magnification of 49,000 x (Extended Data Fig. 5g). The microscope 
was carefully aligned and electron beam tilt was minimized by a coma-free align- 
ment procedure. Images were recorded on a Gatan K2 direct electron detection 
camera with the counting mode, and the pixel size was calibrated as 1.01 A per 
pixel on the specimen using catalase crystals. The dose rate of the electron beam 
was set to ~8e~ per pixel per s, and the image stacks were recorded at 4 frames s! 
for 3s. The drift between frames in each image stack was corrected with the UCSF 
software*!, and the total 12 frames of each stack were merged to generate a final 
image with a total dose of ~25e7 A~. Contrast transfer function (CTF) param- 
eters, including defocus values and astigmatism, were determined by CTFIND” 
(Extended Data Fig. 5g). 

Sample grid preparation, cryoEM imaging and drift correction of frames for the 

transcribing CPV (t-CPV) were performed using the same procedure described 
above for q-CPV with the exception of the camera used. The t-CPV cryoEM images 
were recorded on a new Gatan K2 direct electron detection camera attached to a 
Gatan imaging filter (GIF Quanta) with a pixel size of 1.36 A at the specimen scale 
(Extended Data Fig. 5g). 
Asymmetric reconstruction based on original images. A total of 68,526 particles 
were selected for image processing using Frealign* and Relion*4. The 2x binned 
data set was first processed using icosahedral symmetry with Frealign*’. The cen- 
tres of all particles were then fixed and used for the asymmetrical global search 
with Frealign using 4x binned data set starting at 20 A resolution. 

To generate an initial model, we placed the crystal structure of the MRV RdRP? 
under a previously obtained CPV capsid map” at the location corresponding to 
that in MRV capsid as previously reported’ and imposed a tetrahedral symmetry 
(that is, with 4 three-fold axes, 3 two-fold axes and 12 asymmetric units), resulting 
in a montage map with an empty CPV capsid containing 12 RdRPs but without 
any VP4. This montage map was filtered to 30 A resolution and used as the initial 
model for image processing with Frealign. After 9 iterations of global search and 
2 iterations of refinement, the resolution of the density map was determined to be 
3.9 A. In the final map, only 3 RdRPs (numbers 8-10 in Fig. 1d) remained at the 
same locations as in the initial model with the tetrahedral symmetry. 

The final map was reconstructed using the top 47,968 (70%) particles of the 
original unbinned data set. Averaging all TEC densities under different vertices 
was performed following the procedure described previously*® to improve the 
density quality and the resolution. The effective resolution of the asymmetrical 
and averaged reconstructions were estimated to be 3.9 A and 3.3 A, respectively, 
based on the FSC (>0.143) and the correlation coefficient (>0.5) between the 
density map and atomic model calculated with Phenix (Extended Data Fig. 5g)°”°8. 
These estimated resolutions are consistent with the observed structural features of 
the density maps (Fig. 2, Extended Data Fig. 5e and Supplementary Videos 3-8). 
The averaged map was filtered to the spatial frequency of 1/(3.3 A) and sharpened 
with a reverse B-factor of -120 A”. This B-factor was chosen with a trial-and-error 
method based on the optimization of noise level, backbone density continuity, and 
emergence of side-chain densities. 

Since there were no densities in the initial montage model at the VP4 locations, 
the emergence of VP4 densities in the map and the match of side-chain densities to 
those expected from the VP4 amino acid sequence (Fig. 2) provide strong internal 
controls for the validity of the high resolution cryoEM map. Consistent with this 
assessment, the locations of the RdRP in the final reconstruction are not only dif- 
ferent from those in the initial montage model, but also are related by D3 symmetry 
instead of the tetrahedral symmetry in the initial model. Most convincingly, the 
density features in the final map agree with the CPV RdRP amino acid sequence 
but differ from that of the MRV RdRP used in the initial model. 

In addition, we also performed independent reconstruction without using the 
model of the 12 MRV RdRPs, and obtained a nearly identical structure from the 


LETTER 


same data set. In this procedure, we first determined an icosahedral reconstruc- 
tion without using any initial models. This icosahedral reconstruction was used 
to restrain refinement without symmetry (that is, symmetry operator is C1) to 
search for orientation around the 60 icosahedral-symmetry-related locations with 
Relion**. This independent result further validates our TEC structures. 

To obtain the 3D structure of the transcribing particles, we low-pass filtered the 
above 3D map of q-CPV to 30A resolution and used it as the initial model. After 
11 iterations of asymmetrical global search and 2 iterations of local refinement, 
the density map converged to a resolution of 4.8 A, and the density quality of the 
TEC was further improved to ~4.0 A resolution by aligning and averaging all TEC 
densities inside the asymmetric reconstruction (Extended Data Fig. 5d, f, g). 
Asymmetric reconstruction using capsid-subtracted images. To improve the 
genome structure further, we used the following procedure to carry out asymmetric 
reconstruction of q-CPV with the same particle image data set but with capsid 
contribution subtracted. As illustrated in Extended Data Fig. 1, this procedure 
includes four stages: 1, capsid subtraction in raw particle (orange); 2, initial model 
generation (green); 3, asymmetric feature emergence in Relion™ refinement (blue); 
4, orientation selection (purple). 

In the first stage (orange in Extended Data Fig. 1), we determined the orienta- 
tion and centre parameters for each particle and obtained an icosahedral recon- 
struction with Frealign®* from raw particles with an inverse B-factor of — 40 A” 
(Extended Data Fig. 1a, b). On the basis of these parameters, a CTF-corrected 
projection (Extended Data Fig. 1c) with empirical B-factor of 160 A? was generated. 
Next, the capsid contribution to the images was removed by subtracting the 2D 
projection corresponding to the icosahedral orientation of each image as done 
before?’ with the following improvements. To subtract the contribution from 
the capsid accurately, we determined a scaling factor between capsid projection 
(Extended Data Fig. 1c) and each raw particle image (Extended Data Fig. 1a). 
The projection and raw images were both band-pass filtered between 1/400 A~! 
and 1/29 A~!, then radially masked based on the inner and outer diameters of cap- 
sid to produce ring-shaped projections (Extended Data Fig. 1d) and raw (Extended 
Data Fig. le) images. The standard deviations of these ring-shaped images were 
calculated and used to normalize both the unmasked and masked (that is, ring- 
shaped) projections. The cross-correlation coefficient (0 tol) between the ring- 
shaped raw image and the normalized ring-shaped projection was computed and 
used as the probability factor measuring the contribution of capsid signal in the 
raw particle image. Each raw image was then subtracted by the unmasked projec- 
tion multiplied by this probability factor to generate a capsid-subtracted particle 
(Extended Data Fig. 1f) for the following refinement. Particles with a probability 
factor less than 0.1 were not included in the subsequent analyses. 

In the second stage (green in Extended Data Fig. 1), the map from the above 
Frealign asymmetric refinement (Extended Data Fig. 1g) was low-pass filtered to 
60A resolution, masked with a 260 A radius (Extended Data Fig. 1h), and used 
to refine the capsid-subtracted particle (Extended Data Fig. 1f) with Relion ver- 
sion 1.2. The Tau2_fudge value (T-factor) in Relion was set to 0.5. T-factor is an 
ad hoc value in Relion to tune refinement speed, and a value of 0.5 slowed down 
the refinement progression, thus ensuring the priority use of low resolution (up 
to 20 A, such as dsRNA) data in the refinement. This refinement led to a recon- 
struction without the capsid (Extended Data Fig. 1i). This capsid-removed map 
has 12 TECs with D3 symmetry, which could be classified into two groups: the 
first group containing six better-resolved TECs close to the three-fold axis (polar), 
and the other group containing six less-resolved TECs near the equator (tropical), 
suggesting potential smear of density due to orientation mis-assignments or TEC 
flexibility/lower occupancy near the equator. 

To eliminate potential orientation mis-assignments further, we next conducted 
the third stage of data processing (blue in Extended Data Fig. 1). We first low-pass 
filtered the capsid-removed reference (Extended Data Fig. 1i) to 32 A resolution 
(Extended Data Fig. 1j) and used it to drive Relion refinement with the capsid-sub- 
tracted particles (Extended Data Fig. 1f). The T-factor used in this refinement is 
0.1, only 2.5% of that used in Relion convention, thus ensuring slow progression 
of the refinement. Slower refinement provides time for asymmetrical feature to 
emerge. Relion global search was carried out with a 3.75° degrees angular inter- 
val, followed by local angular search with 1.875° interval and highly constrained 
translational search (0.7 pixel in range with 0.5 pixel interval). Asymmetrical RNA 
density feature with ten TECs emerged after ten iterations (Extended Data Fig. 1k). 
In our procedure, one way to prevent trapping into local minima in orientation 
assignment due to symmetric structural elements is to filter the current refinement 
result back to ~32 A resolution and refine with T-factor of 0.1 again to remove 
residual symmetric feature from the working reference. This process is carried 
out iteratively. 

To improve resolution of the 3D map further, we carried out the fourth stage 
for particle orientation selection (purple in Extended Data Fig. 1). From the 
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orientation of each particle determined in the high-resolution (~3 A) icosahedral 
reconstruction (Extended Data Fig. 1b), we calculated 60 icosahedral-related ori- 
entation candidates. The task of the rest of the fourth stage of data processing is to 
select one out of these 60 orientation candidates to be the asymmetric orientation 
of the particle as done before****. To do this, we continued to run Relion refine- 
ment for 15 iterations using the above asymmetric map with 10 TECs (Extended 
Data Fig. 1k) as initial model and the orientation determined by each iteration was 
recorded, giving rise to 15 Relion orientations for each particle. For each of these 15 
Relion orientations, we calculated its angular distances to the 60 icosahedral-related 
orientation candidates, and the icosahedral-related orientation candidate with the 
smallest angular distance was selected as the working orientation for that iteration, 
resulting in a total of 15 working orientations for each particle. The particle would 
be retained if 14 or all of its 15 working orientations are the same (that is, the 
selected orientation) and their averaged angular distance was less than 3 degrees. 
Otherwise, this particle will be discarded. This procedure yielded a total of 11,741 
particles with selected orientation. The original raw images of these selected parti- 
cles were combined to generate an asymmetric reconstruction using Frealign and 
the resolution was determined to be 5.1 A. 

As shown in Extended Data Fig. 2, this procedure was repeated by using a 
Gaussian ball to replace the capsid + TEC model (Extended Data Fig. 1g) in the ini- 
tial model generation stage (green in Extended Data Fig. 1). The result is the same, 
confirming that our procedure was not influenced by the choice of initial model. 
Atomic modelling and visualization. The atomic models of both RdRP and VP4 
in the quiescent state were built with Coot and refined with Phenix**, as described 
previously”. 

The atomic model of the VP4 structure was manually built with Coot. Because 
no homology models of VP4 previously existed, the Ca carbon backbone was 
constructed by matching the VP4 amino acid sequence to the density map. Once 
the correct placement of each residue was ensured, the backbone was converted 
to a purely alanine backbone by the function ‘Mainchain, and mutated to the cor- 
responding amino acids through the function ‘Mutate Residue Range’. With the 
initial model now completed, the ‘Density Fit Analysis’ validation tool was used 
to screen for sequences of the model that did not fit the density. When identified, 
these sequences and the amino acids surrounding them were examined for any 
other possible conformations that would better fit the density. Owing to the high 
resolution of this structure, this was completed through the refinement tool ‘Real 
Space Refine Zone; which optimizes the fit of the model to the mass density while 
preserving stereochemistry. Additionally, refinement was also performed based 
on the Ramachandran plot, an important indicator of three-dimensional protein 
structure that validates the torsion angles of a protein chain. In the Ramachandran 
plot, any residues with disallowed values were selected, and the stereochemistry 
of that residue along with its surrounding residues was optimized with the refine- 
ment tool ‘Regularize Zone. After ideal Ramachandran values were obtained (<1% 
outliers), the refinement function ‘Rotamers’ was used to select a rotamer that 
best fit the density. 

The atomic model of the polymerase structure was also manually built with 
Coot. However, since an atomic model for the MRV polymerase was available in 


the Protein Data Bank (accession number 1 MUK), this model was used as a tem- 
plate to assist with model building through the identification of the N terminus, 
C terminus and various secondary structures. Once the Ca carbon backbone was 
built by matching the polymerase amino acid sequence to the density map and 
mutated to the appropriate amino acids, the model was refined with ‘Regularize 
Zone; ‘Rotamers’ and ‘Real Space Refine Zone. The model was validated with 
the Ramachandran plot and the function ‘Density Fit Analysis. The complex 
of VP4 and polymerase was then refined with Phenix, including the real space 
refinement*®. 

The atomic models of the transcribing state were built by fitting the atomic 
structures of RdRP and VP4 at quiescent state into the density, manually adjusting 
the changed residues with Coot, and refining the models with Phenix*. 

Visualization, segmentation of density maps, and generation of videos were 
done with UCSF Chimera*!. 
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Extended Data Figure 1 | Illustration of the asymmetric reconstruction procedure using particles with the capsid density subtracted. See Methods for 
full explanation of panels a~m. 
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Extended Data Figure 2 | Validation of asymmetric reconstruction from capsid-subtracted images using a Gaussian ball as the initial model. Arrows 
linking a to f represent the progression of the procedure. The top panels (a, c, e) show the input model for each run and the bottom panels (b, d, f) show the 
output of each run. 
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Extended Data Figure 3 | Sections of the q-CPV density map along the three-fold (that is, the earth axis) (a) and two-fold (b) axes of the pseudo-D3 
symmetry. Note the lack of three-fold and two-fold symmetry in the RNA density in contrast to the perfect symmetry of the capsid shell proteins. Pixel 
size = 4.04 A; clipped map size = 166 x 166 x 120 pixels. 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


North pole 


A hy = 
om. : a 


Southern Northern North Pole 


South Pole 


Extended Data Figure 4 | dsRNA density maps in the quiescent state. are arranged and numbered according to Fig. 1d. First row, TECs 1, 2, 3; 

a, View of TEC + RNA densities with the same orientation of Fig. 1d. second row, TECs 4, 5, 6; third row, TEC 7 and two unoccupied positions; 

b, c, The same view as in a but rotated by +90° (b) or — 90° (c) along x axis fourth row, TECs 8, 9, 10. All TECs have a dsRNA segment bonded at the 

in panel a to view from either north (b) or south (c) poles. d-f, Three views flange, each marked with a black arrow. Unlike the polar TECs, each tropical 
from three two-fold axes on the equator, each is rotated by 120° along the TEC (4-7) is surrounded by an extra density rod (open arrow). 


y axis from each other. g, dsRNA density maps at the twelve vertices. TECs 
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Extended Data Figure 5 | CryoEM reconstructions of CPV in the reconstructions of capsid + genome and the locally averaged TEC densities, 
quiescent and transcribing states. a, b, CryoEM images of CPV particles respectively. The effective resolutions of the local averaged maps are ~3.3 A 
in quiescent (a) and transcribing (b) states. These images were obtained (c) and ~4.0 A (d) resolution (FSC>0.143) for maps in the quiescent 
by aligning and averaging frames in direct electron counting image stacks. and transcribing, respectively. e, f, CryoEM densities (grey surface 
Fibre-like nascent mRNAs are visible over background in b (marked representations) superimposed with atomic models (ribbons and sticks) 
by green arrows), while the background in a is clean. c, d, Fourier shell for the quiescent (e) and transcribing (f) states. The a-helix (Pa12) and the 
correlation coefficients (FSCs) as a function of spatial frequency between four-stranded B-sheet (P4, P7—-8 and P11) in e and fare both from the palm 
two half maps for reconstructions in the quiescent (c) and transcribing subdomain of the polymerase domain at 3.3 A (e) and ~4.0A (f) resolutions. 
(d) states. The black and red lines represent FSCs for the asymmetrical g, Statistics of CPV reconstructions and atomic model refinement. 
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Extended Data Figure 6 | Sequence and secondary structure assignment of CPV RdRP in the quiescent state. a-helices were marked by cylinders, 
6-strands by arrows, loops by thin lines, and the flexible tip domain by dashed lines. The colour scheme is the same as Fig. 3a. 
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Extended Data Figure 7 | The RdRP-bound dsRNA in the quiescent and quiescent (d) and transcribing (e) states. f, g, Models of TEC (surface 


transcribing states. a—c, Location of a TEC on the inner surface of the representation) and dsRNA (ribbons) in the quiescent (f) and transcribing 
capsid shell in the quiescent and transcribing states. The inner surface of (g) states. Close-up views show the bound dsRNA (surface representation) 
the CPV capsid (a) with 10 CSPs labelled (CSP A.1/B.1 to CSP A.5/B.5). on RdRP in the quiescent state (f) and its detachment in the transcribing 
b, c, Position of a TEC on the inner surface of capsid in the quiescent (b) state (g). VP4 is coloured cyan and the RdRP is coloured as in Fig. 3a. All 
and transcribing (c) states. VP4 and RdRP are coloured cyan and purple, surfaces displayed in this figure were rendered from models, except for the 
respectively. An icosahedral five-fold axis is indicated with a small green density maps of RARP + dsRNA in d, e. 


pentagon. d, e, CryoEM densities of TEC and dsRNA (orange) in the 
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Extended Data Figure 8 | Tracing amino acid residues 910-932 and 
971-1000 of module B of the bracelet domain of RdRP in the quiescent 
and transcribing states. a, b, CryoEM densities of RdRP in the quiescent 

(a) and transcribing (b) states. The locations of the residues 910-932 and 
971-1000 are indicated with cyan boxes in a and b. Owing to their flexibility, 
these residues are not readily visible when displayed as in a and b but 
become visible when the maps are filtered to a lower resolution (for example, 
4.5 A resolution) as in c-f. The colour scheme of domains/subdomains is the 
same as in Fig. 3a. c, Trace of the residues 971-1000 (green) and 910-932 
(purple) of module B of the bracelet domain of RdRP in the quiescent state. 
d, The same as c but in a different view. e, Trace of the residues 971-1000 


(green) and 910-932 (purple) of module B of the bracelet domain of RdRP 
in the transcribing state. f, The same as e but in a different view to show the 
unambiguous trace of the two peptide fragments. g, h, Trace of the residues 
910-923 (g) (purple) and 926-932 (h) (purple) of the bracelet domain of 
RdRP in the transcribing state, showing the unambiguous trace of the two 
peptide fragments. i, j, CryoEM densities (grey) and model (ribbon) of 
RdRP in the transcribing state, showing a-helices (i) and a B-hairpin (j). The 
colour scheme of domains/subdomains is the same as in Fig. 3a, k, 1, Trace 
of the residues of the bracelet domain of RdRP in the transcribing state, 
showing a a-helix (k) and a B-sheet (1). 
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(quiescent RdRP) * pe (transcribing RdRP) 


Extended Data Figure 9 | Stereo and rotated views of Fig. 4a, b. a, b, Stereo views of modules A (yellow cylinders and loops) and B (purple cylinders and 
loops) of the bracelet domain of RdRP in the quiescent (a) and transcribing (b) states. c, d, Same as in a, b, but rotated around the x axis by 90°. All surfaces 
displayed in this figure were rendered from models. 
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Extended Data Figure 10 | Comparisons of RdRPs from CPV and MRV. N-terminal (c), polymerase (d) and bracelet (e) domains. f-h, Comparisons 
a, b, CryoEM in situ structure of the RdRP in t-CPV (a) and crystal structure of modules A (yellow) and B (magenta) of the bracelet domain of RdRPs 

of the MRV RdRP (b), both containing a RNA duplex in the active site. c-e, from q-CPV (f), t-CPV (g) and MRV (h). i-k, The same as in f-h, but with 
Superposition of domains of RdRPs from t-CPV (colour) and MRV (grey): helices shown as cylinders, as in Fig. 4a, b. 
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Foreign DNA capture during CRISPR-Cas adaptive 


immunity 


James K. Nufiez"’, Lucas B. Harrington", Philip J. Kranzusch'?, Alan N. Engelman** & Jennifer A. Doudna 


Bacteria and archaea generate adaptive immunity against phages 
and plasmids by integrating foreign DNA of specific 30-40-base- 
pair lengths into clustered regularly interspaced short palindromic 
repeat (CRISPR) loci as spacer segments’ °. The universally 
conserved Casl1-Cas2 integrase complex catalyses spacer 
acquisition using a direct nucleophilic integration mechanism 
similar to retroviral integrases and transposases’ '*. How the 
Cas1-Cas2 complex selects foreign DNA substrates for integration 
remains unknown. Here we present X-ray crystal structures of the 
Escherichia coli Cas1-Cas2 complex bound to cognate 33-nucleotide 
protospacer DNA substrates. The protein complex creates a curved 
binding surface spanning the length of the DNA and splays the 
ends of the protospacer to allow each terminal nucleophilic 3’-OH 
to enter a channel leading into the Cas] active sites. Phosphodiester 
backbone interactions between the protospacer and the proteins 
explain the sequence-nonspecific substrate selection observed 
in vivo’ *. Our results uncover the structural basis for foreign DNA 
capture and the mechanism by which Cas1-Cas2 functions as a 
molecular ruler to dictate the sequence architecture of CRISPR loci. 

CRISPR loci are defined by repetitive elements that are separated by 
similarly sized spacer sequences acquired from foreign DNA during 
the adaptation stage of CRISPR-Cas adaptive immunity®™*. CRISPR 
transcripts generated from the loci assemble with Cas proteins to detect 
and cleave foreign nucleic acids bearing sequence complementarity 
to the spacer segment’”’”"””. In E. coli, expression of the Cas1-Cas2 
protein complex triggers acquisition of new 33-base-pair (bp) spacers at 
the A/T-rich leader end of the CRISPR locus’ "°°. How the Cas1-Cas2 
complex selects 33-bp protospacers of variable sequences and activates 
the 3’-OH ends for integration remains unknown. As the Cas1-Cas2 
complex is sufficient to initiate spacer acquisition and adaptation of 
the CRISPR-Cas immune system, we hypothesized that the protein 
complex alone must provide the structural basis for the unknown 
mechanism of spacer length determination. 

To determine how protospacer variation influences the efficiency of 
Cas1-Cas2-mediated spacer acquisition, we used an in vitro integration 
assay to test versions of a 33 bp sequence with constant overall length 
but different 3’ single-stranded overhang lengths’. The protospacer 
sequence is derived from the M13 bacteriophage genome and is highly 
acquired into the E. coli CRISPR locus after infection®. Unexpectedly, 
protospacers with overhanging 3’ nucleotides are strongly preferred by 
the Cas1-Cas2 complex over a completely double-stranded 33 bp pro- 
tospacer (Fig. la and Extended Data Fig. 1a, b). Single-stranded DNA 
and substrates with 5/ overhangs are poor substrates for integration, 
highlighting the ability of Cas1-Cas2 to select specific DNA substrates 
before integration’. The most preferred protospacer DNA for in vitro 
integration consists of five overhanging nucleotides on each 3’ end 
(Extended Data Fig. 1). To determine the molecular basis of Cas1-Cas2 


1,2,5,6,7,8 


protospacer capture, we assembled Cas1-Cas2 complexes with the pre- 
ferred protospacer substrate and determined crystal structures of the 
complex in the presence and absence of Mg™* at 3.0 A and 3.2A reso- 
lutions, respectively (Extended Data Fig. 2 and Extended Data Table 1). 

The structures reveal a hexameric protein architecture comprising 
four copies of Cas1 and two copies of Cas2, in which the protospacer 
spans the central Cas2 dimer and terminates within individual Cas1 
subunits on each end of the complex (Fig. 1b). Structural superposition 
of the Cas1—Cas2 complex with and without bound DNA reveals a 
DNA-induced change in Cas1 subunit orientation in which each Cas1 
dimer rotates ~10° in opposing directions against the central Cas2 hub 
(Extended Data Fig. 3a, b). Cas1-Cas2 protospacer capture positions 
each single-stranded protospacer 3’ end within a channel leading 
directly to a Cas] active site. Simulated annealing omit maps show clear 
electron density for the double-helical region and the five-nucleotide 
overhangs on each end of the protospacer (Extended Data Fig. 4a-c). 
The constrained protein channel guiding each DNA strand from its 
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Figure 1 | Overall architecture and active site positioning of 3’-OH 
nucleophile. a, A representative agarose gel of in vitro integration reactions 
using increasing lengths of 3’ single-strand (ss) protospacer DNA overhangs. 
Per cent integration values are the average of three independent experiments. 
kb, kilobases; nt, nucleotide; S.C., supercoiled pCRISPR; Band X, relaxed 
pCRISPR byproduct (ref. 12). b, The overall architecture of Cas1-Cas2 bound 
to protospacer DNA. The line segments indicate the length of the DNA, 
spanning a total of 33 nucleotides. c, Stick configurations of the two Cas1 
active sites (blue subunits in b) that coordinate the nucleophilic 3’-OH ends 
of the protospacer (green arrow). Supplementary Information contains the 
full image for a. 
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Figure 2 | Coordination of protospacer DNA within the complex. 

a, Electrostatic potential surface representation of the Cas1-Cas2 complex 
with the protospacer shown in yellow. b, Close up of the arginine channel 
that stabilizes the ssDNA overhang. ¢, Stick configuration representation 
of arginine clamp residues that coordinate the protospacer duplex 

region. d, Map of amino acid residues that coordinate the protospacer 
phosphodiester backbone (black dots). Residue colours indicate 
Cas1-Cas2 protomers from Fig. 1b. e, Agarose gels of in vivo spacer 
acquisition assays of arginine channel and clamp mutant proteins. 


double-helical region to the single-strand-accommodating Cas] active 
site explains the specificity of Casl-Cas2 for five-nucleotide 3’ over- 
hang substrates (Fig. 1a and Extended Data Fig. 1). Two of the four 
Cas1 subunits, coloured green in Fig. 1b, are not occupied with the 
protospacer 3’ ends and are probably non-catalytic, since the 3’/-OH 
nucleophile and the scissile phosphodiester bond of the target DNA 
must be in the same active site for direct nucleophilic integration. 

In the active sites, the 3’ terminal base is involved in a stacking inter- 
action with Y217 that positions the nucleophilic 3’-OH ends of the 
protospacer near the conserved metal-binding residues E141, H208 
and D221 (Fig. 1c). Although we cannot assign density for Mg” in 
the active sites, these three residues have been shown previously to 
coordinate a Mn** ion in the active site of Cas1 from Pseudomonas 
aeruginosa’. Furthermore, alanine mutations at these positions dis- 
rupt in vivo spacer acquisition”*””. Thus, the observed positioning of 
the 3’-OH nucleophiles and catalytic residues probably represents the 
active configuration of the nucleoprotein complex immediately before 
spacer integration. 

All interactions between Cas1-Cas2 and protospacer DNA involve 
coordination of the phosphate backbone rather than base-specific con- 
tacts, consistent with the variable sequence selection of protospacers 
that is essential for resistance to diverse foreign sequences” *. Two cen- 
tral regions of the Cas1-—Cas2 complex, which we term the ‘arginine 
clamp’ and the ‘arginine channel, stabilize the protospacer (Fig. 2a—d). 
The arginine clamp interacts with the middle of the duplex region 
where four Arg residues coordinate each DNA strand: Cas] R41 and 
Cas2 R16, R77 and R78 (Fig. 2c). Reverse charge mutations of Cas] R41 
and Cas2 R16 and R78 drastically reduce spacer acquisition in vivo, 
whereas the Cas2 R77E mutant functions similar to wild-type Cas2 
(Fig. 2e). Thus, Cas] R41, Cas2 R16 and R78 are the key constituents of 
the arginine clamp. The contribution of Cas2 to protospacer DNA bind- 
ing supports the previous hypothesis that the main function of Cas2 
is to form a non-catalytic scaffold within the Cas1-Cas2 complex’. 

Cas1 residues R66, R84, R245 and R248 line the arginine channel 
that stabilizes the junction where the duplex region terminates and the 
single-stranded DNA overhang enters the active site. Reverse charge 
mutations of each arginine lining the arginine channel disrupts spacer 
acquisition in vivo (Fig. 2e). In addition, purified Cas1 R59D or R66D 
proteins complexed with wild-type Cas2 are highly defective in inte- 
grating 33-bp duplex or five-nucleotide overhang protospacer sub- 
strates in vitro (Fig. 2f). Fluorescence polarization assays demonstrate 
that the mutant complexes exhibit dramatically reduced affinity for 
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WT, wild type. f, Plot of per cent in vitro integration of either double-stranded 
DNA (dsDNA; black) or 5-nucleotide (nt) overhang (blue) protospacers 

with wild-type Cas1, Cas1(R59D) or Cas1(R66D) complexed with Cas2. 

g, Fluorescence polarization binding assays of a 5-nucleotide overhang 
protospacer with the same mutants in f complexed with Cas2. The calculated 
relative binding affinities (K,) are indicated. Error bars represent the standard 
deviation of three independent experiments. Data in panel e-g are results of 
at least three biological replicates. Supplementary Information contains the 
full images for e. 


protospacer DNA, highlighting the critical role of this part of the Cas1- 
Cas2 complex for protospacer capture and complex stability (Fig. 2g). 

The Cas1-Cas2-DNA crystal structures uncover a protein wedge 
that terminates the protospacer double-stranded DNA region and 
allows single-stranded DNA overhangs to enter the arginine channel. A 
stacking interaction of the 5’ terminal base (adenine 6 in Fig. 3a, b) with 
Y22 of Cas! stabilizes protospacer duplex unwinding, directing each 
single-stranded 3’ overhang to sharply bend ~90° away from the duplex 
and into the active site channel (Fig. 3b). A mutation of Y22 to alanine 
reduces spacer acquisition in vivo, whereas a phenylalanine mutation 
has near wild-type levels of acquisition, consistent with a specific role 
for Cas] Y22 base-stacking in protospacer strand splaying (Fig. 3c). 
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Figure 3 | Mechanism of protospacer DNA end separation. a, The 
5-nucleotide splayed protospacer sequence used for crystallization to 
determine the trajectory of the displaced non-nucleophilic strand. Cas] Y22, 
involved in base stacking at the fork, is shown in blue. b, Close up of the DNA 
fork showing the base stacking interaction of Y22 with the terminal adenine 
nucleotide of the non-nucleophilic strand. The nucleotides are numbered 
from 5’ to 3’ of each DNA strand shown in a. The grey mesh shows the 

2F, — F. density contoured at 2.26 of the first ejected nucleotide of the 
displaced strand. The arrows indicate the opposite trajectories of each strand. 
c, Agarose gel of in vivo acquisition assay of co-expressed wild-type (WT) 
Cas] or the indicated Cas1 mutant with Cas2. Quantification is the mean 

of three independent experiments + standard deviation. d, Plot of per cent 
integration of increasing number of splayed nucleotides at the protospacer 
ends using wild-type Cas1 (blue) or Cas1(Y22A) (blue) complexed with 
Cas2. Error bars represent the standard deviation of three independent 
experiments. Supplementary Information contains the full image for c. 
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Figure 4 | Model of protospacer DNA integration. a, View of crystal 
packing from a symmetry mate complex (grey) showing coordination of the 


symmetry DNA along a Cas] active site. The inset is a magnified view of the 
coordination of the phosphodiester backbone with metal-binding Serna 
E141, H208 and D221. The mesh represents a F, — F. density for a Mg” 
contoured at 2.20. b, c, Model of protospacer DNA integration into ree. 
DNA (black) and positioning of the scissile phosphate (green arrow) and the 
3’-OH nucleophile in the Cas] active site. 


Sequence alignment of representative Cas1 proteins in type I CRISPR 
systems reveals that Y22 is not universally conserved in other bacteria, 
suggesting that additional or different Cas1 residues may stabilize the 
splayed ends in other CRISPR-Cas systems (Extended Data Fig. 5). 

The observed stacking interaction raises the possibility that fully 
duplexed protospacers are separated by Cas1 Y22, thereby displacing 
the 5’ end of the duplex, which we term the non-nucleophilic strand, 
from the nucleophilic strand carrying the 3’-OH. DNA transposases 
and retroviral integrases also utilize end fraying to isolate the reactive 
DNA strands for chemistry within enzyme active sites” ~*. To test this 
potential activity of Cas1-Cas2, we introduced an increasing number 
of mismatches at the ends of the 33 bp protospacer to disrupt end base 
pairing and assayed their potential for in vitro integration (Fig. 3d and 
Extended Data Fig. 6a, b). Similar to the 3’ overhang substrates, the 
4- and 5-nucleotide frayed ends are highly preferred, presumably due 
to the lower energy required for capture of these substrates compared 
to perfectly duplexed ends (Fig. 3d). The complex containing the Cas1 
Y22A mutant regains marginal activity with substrates containing 
5- or 6-nucleotide splayed ends, suggesting that Y22 steers the 
non-nucleophilic DNA strand away from the active site (Fig. 3d). 
Notably, the displaced non-nucleophilic strand is not cleaved into a 
shorter fragment by Cas1—Casz2, as the protospacer ends are not pro- 
cessed during integration (Extended Data Fig. 6c). 

To determine the trajectory of the displaced non-nucleophilic strand 
after end-splaying, we crystallized Cas1-Cas2 with a protospacer with 
five-nucleotide frayed ends on both sides (Fig. 3a, b). The electron 
density at the fork is similar to the structures described above, except 
that we observe the first nucleotide of the displaced non-nucleophilic 
strand pointing in the opposite direction from the nucleophilic 
single-stranded DNA strand. Clear electron density is not observed 
for the remaining nucleotides of the displaced strand, indicating that 
they are not stabilized by the complex. 

An alternative crystal form grown in the presence of Mg” reveals 
secondary Cas1-DNA interactions that provide additional insight into 
the mechanism of Cas1-Cas2 genomic DNA target binding and sub- 
sequent integration. In addition to the two Cas] ‘catalytic active sites 
carrying the 3’-OH ends of the protospacer, the ‘non-catalytic’ Cas1 
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active sites interact with the protospacer DNA from a symmetry mate, 
revealing a possible coordination of the target DNA during integration 
(Fig. 4a and Extended Data Fig. 7a). The non-catalytic Cas] engages 
the DNA minor groove by contacts with a-helix 7, causing a slight kink 
on the DNA compared to our alternative crystal form lacking Mg** 
(Extended Data Fig. 7b). A close-up of the active site shows continuous 
density for Mg”* with E141, H208, D221 anda phosphate backbone of 
the presumed target DNA, capturing a snapshot of scissile phosphodi- 
ester bond coordination before integration (Fig. 4a). 

Because integration must occur in the active site that coordinates the 
3’-OH of the protospacer DNA, we modelled the protein-DNA inter- 
actions from the non-catalytic Cas] active sites into the catalytic Cas1 
active sites, This reveals the positioning of the nucleophilic 3’-OH of the 
protospacer ends for attacking the scissile phophodiester bond in the 
modelled DNA (Fig. 4b, c). Further work will be needed to shed light 
on how the complex specifically recognizes the leader-repeat region 
of the CRISPR locus for integration, as recently observed in vitro!!"9, 

Together, these data explain key aspects of Cas1-Cas2 integrase- 
mediated acquisition of new DNA into bacterial genomes. First, we 
show that the substrates for integration are double-stranded DNA. 
Importantly, however, optimal substrates include a central 23 bp hel- 
ical region flanked by five single-stranded nucleotides on each 3’ end. 
If substrates for CRISPR integration come from single-stranded DNA 
products of RecBCD, as recently suggested, they must somehow anneal 
or otherwise become double stranded before Cas1-Cas2 capture”. It 
remains unclear how the Cas1-Cas2 complex recognizes the AAG 
protospacer adjacent motif during protospacer selection, since the 
terminal nucleotides containing the 3’-OH nucleophiles are coordi- 
nated similarly in the Cas] active sites (Fig. 1). Second, the Cas1-Cas2 
integrase architecture specifies the precise length of integrated DNA, 
ensuring uniformity of spacer lengths within CRISPR loci. Finally, the 
structure-based model of DNA target sequence positioning suggests that 
in addition to catalysing the integration reaction, Cas1 plays a role in 
binding the target CRISPR locus. Target binding could possibly disrupt 
the structural symmetry observed in the crystal structure to coordi- 
nate the sequence- specific integration reactions at the leader-end of the 
CRISPR locus. Insights into target site recognition may offer strategies 
for altering or enhancing integration site specificity, with implications 
for use of the Cas1-Cas2 integrase as a genome-modifying technology. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cas1, Cas2 and DNA preparation. The Cas1 and Cas2 proteins from E. coli 
K12 (MG1655) were cloned and separately purified as previously described’®. 
Single-stranded DNA (ssDNA) oligonucleotides purchased from Integrated 
DNA Technologies were annealed in 20 mM HEPES-NaOH, pH 7.5, 25mM KCl, 
10mM MgCl, by heating at 95°C for 3 min and slow cooling to room temperature. 
The pCRISPR DNA target for in vitro integration was constructed as previously 
described”. The DNA substrates used for crystallization were gel-purified before 
complex formation. The sequences for the five-nucleotide overhang substrates 
used for crystallization are: ssDNA1, 5'-ATTTACTACTCGTTCTGGTGTTTCT 
CGT-3’; and ssDNA2, 5‘-AAACACCAGAACGAGTAGTAAATTGGGC-3’. The 
sequences for the five-nucleotide splayed substrates are: ssDNA1, 5‘-TAAACAT 
TTACTACTCGTTCTGGTGTTTCTCGT-3’; and ssDNA2, 5’-CATCTAAACAC 
CAGAACGAGTAGTAAATTGGGC-3’. 

In vivo acquisition and in vitro integration assays. The in vivo acquisition 
assays were performed as previously described’. The in vitro integration reac- 
tions were conducted as previously described with slight modifications’”. After 
pre-incubation of equimolar Cas1 and Cas2 at 4°C, 100 nM of the resulting Cas1- 
Cas2 complex was incubated with 100 nM protospacer DNA for an additional 
10-15 min at room temperature. The integration reaction was activated by the 
addition of 300 ng (~5nM) pCRISPR, incubated at 37°C for 1h and quenched with 
DNA loading buffer supplemented with EDTA at a final concentration of 20mM. 
The reaction products were analysed on 1.5% agarose gels. Per cent integration 
activity values were determined by quantifying the band intensity of the relaxed 
pCRISPR product and dividing over the intensity of all bands detected by Image 
Lab Software (Bio-Rad). We note that the integration activity could be a mixture 
of half-site and full-site integration products, as described previously’. 
Complex formation, crystallization and structure determination. Purified Cas1 
and Cas2 were incubated with protospacer DNA at equimolar concentrations 
(50 uM) in buffer A (500 mM KCI, 20mM HEPES-NaOH, pH 7.5, 1mM DTT, 
10mM EDTA), followed by overnight dialysis at 4°C against buffer B (100 mM 
KCl, 20mM HEPES-NaOH, pH 7.5, 1mM DTT, 5mM EDTA). The dialysed sam- 
ple was applied on a Superdex 75 10/300 column (GE Healthcare) in buffer B. 
Peak fractions were pooled and concentrated to ~3 mg ml ' for crystallization. 
Optimized crystals were grown by hanging-drop vapour diffusion at room tem- 
perature in two different conditions, as described in the text. The Mg”*-containing 
crystals grew as gem-like morphologies in 50 mM MES, pH 6.1, 10% isopropanol 
and 20mM MgCl,. The Mg’*-free crystals grew as rods in 100mM sodium citrate 
tribasic pH 5.6, 200mM sodium acetate and 8% PEG 8000 (w/v). The crystals were 
briefly transferred into a drop containing either 25% ethylene glycol (with Mg”* 
crystals) or 30% glycerol (without Mg” crystals) for cryoprotection and frozen in 
liquid nitrogen. The Cas1-Cas2 complex with a splayed DNA substrate crystallized 
in the same conditions as the Mg”*-free crystals. 


LETTER 


X-ray diffraction data were collected under cryogenic conditions at beam- 
line 8.3.1 at the Lawrence Berkeley National Laboratory Advanced Light Source. 
Initial phases were obtained by sequential molecular replacement using individual 
protein components of the Cas1-Cas2 apo structure (Protein Data Bank (PDB) 
accession number 4P6I) as search models. Following initial placement of two Cas1 
dimers and a dimer of Cas2, phases were improved by performing one round of 
rigid body refinement in PHENIX~’. The resulting maps showed clear unbiased 
density for protospacer DNA, and subsequent model building was performed 
through iterative rounds of building in Coot” and refinement in PHENIX with 
NCS restraints on the protein subunits. The asymmetric unit of the three struc- 
tures contains one copy of the Cas1-Cas2 complex bound to protospacer DNA. 
Statistics for the final crystal structures are reported in Extended Data Table 1. 
The final structures are missing clear density for the loop connecting a6 and a7 
of Cas1. We assume this loop to be highly disordered as it is also not observed 
in the apo E. coli Cas1 crystal structure (PDB 3NKD) and the apo Cas1-Cas2 
complex (PDB 4P61) 107”, 

Fluorescence polarization. Fluorescence polarization assays were performed in 
20mM HEPES-NaOH, pH 7.5, 25 mM KCl, 5mM EDTA, 1 ug ml 'BSA and 1mM 
DTT. Cas1-Cas2 were complexed and purified over gel filtration for all binding 
assays. The 3/-fluorescein labelled DNA substrate was added to the protein solu- 
tion at a final concentration of 5nM and the DNA-protein mixture was allowed 
to incubate for 30 min at 22°C. Measurements were made by excitation at 485 nm 
and monitoring emission at 535 nm. Data were fit to a binding isotherm to obtain 
K,. Each experiment was conducted in triplicate and error bars represent the 
standard deviation. 

Sequence alignment. The cas1 sequences were obtained from the National 
Center for Biotechnology Information (NCBI) Gene Data Bank. A representative 
cas1 from each CRISPR type I subtype were chosen based on previous subtype 
assignments and the alignment was generated using MAFFT”™*”’. The organ- 
isms chosen for the alignment are: Escherichia coli K-12, Cronobacter dublin- 
ensis str. 582, Erwinia amylovora, Yersinia pestis biovar Antiqua str. B42003004, 
Yersinia kristensenii, Hafnia alvei, Sulfolobus solfataricus, Thermotoga maritima, 
Pseudothermotoga lettingae, Deferribacter desulfuricans, Desulfovibrio vulgaris, 
Bacillus halodurans, Bacillus cereus, Synechocystis sp. PCC 6803, Cyanothece sp. 
PCC 8802 and Limnoraphis robusta. 
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Extended Data Figure 1 | Effect of overhang length on integration efficiency. a, A plot of the per cent integration of protospacers + standard deviation with 
varying 3’ single-stranded DNA extensions. A representative gel is shown in Fig. la. b, Protospacer sequences used for the assays described in a and Fig. 1a, 


with the red nucleotides indicating the 3’ overhang regions. 
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Extended Data Figure 2 | Assembly of Cas1-Cas2 complex bound unbound DNA (second peak). b, c, The fractions from peak 1 (~12 ml) and 
to protospacer DNA. a, Gel filtration chromatogram of pre-assembled peak 2 (~15 ml) were analysed by Coomassie-stained SDS-PAGE (b) and 
Cas1-Cas2 complex with protospacer DNA containing five-nucleotide 3’ 12% urea-PAGE (c) to confirm the presence of Cas1, Cas2 and protospacer 
overhangs. The dotted lines indicate the peak fractions of the Cas1-Cas2 DNA. d, Gel-filtration chromatogram of assembled Cas1—Cas2 without 
complex without DNA, as shown in d. The dotted lines indicate the peak protospacer DNA. e, Coomassie-stained SDS-PAGE of the peak fractions 
fractions of the Cas1-Cas2 complex bound to DNA (first peak) and excess, from d. Supplementary Information contains the full images for b, c and e. 
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Extended Data Figure 3 | Conformational dynamics upon protospacer DNA binding. a, An overlay of the DNA-bound Cas1-Cas2 structure with the 
apo Cas1-Cas2 (grey, PDB 4P6I). b, Vector lines depicting the conformational changes the Cas1-Cas2 complex undergoes upon protospacer DNA binding 
compared to the apo complex (PDB 4P6I). The Cas] subunits rotate towards the direction of the arrows. 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 
b c 
Structure with Mg?* Structure without Mg?* 
Terminus 1 Terminus 2 Terminus 1 Terminus 2 
3’ end 3’ end 3’ end 


Extended Data Figure 4 | Omit maps of the protospacer DNA. a, Simulated annealing F,— F. omit electron density map of the entire protospacer DNA using 
the ‘no Mg”’’ map and model. b, c, Simulated annealing F,— F. omit electron density maps of the terminal five nucleotides in the active sites of the structures 
(a) with Mg’* or (b) without Mg’t in the crystallization condition. The maps are contoured at 2.00. 
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Extended Data Figure 5 | Sequence alignment of Cas] proteins in type I 


CRISPR systems. Sequence alignments of Cas1 from representative 
organisms with type I CRISPR systems. The E. coli sequence is displayed 
at the top. The dots indicate the residues described in this study, with the 


red dots indicating the metal-binding residues. The box highlights the 
non-universal conservation of the E. coli Y22 residue in the 61 region of 
type I CRISPR systems. The secondary structure representations shown are 
for the E. coli Cas1. 
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Extended Data Figure 6 | Integration of protospacer substrates with 
splayed ends. a, Representative agarose gel of in vitro integration reactions 
using increasing lengths of splayed ends. The average per cent integration 
of three independent experiments is plotted in Fig. 3d. b, Sequences of 
protospacers used in the integration assays in a. c, A 12% denaturing 
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polyacrylamide gel of protospacers after incubation with Cas1-Cas2 for 1h 
at 37 °C in integration assay buffer conditions. The indicated DNA substrates 
are radiolabelled at the 5’ end. Supplementary Information contains the full 
images for a and c. nt, nucleotide. 
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Extended Data Figure 7 | Crystallographic packing of the complexbound _ crystal structures, with or without Mg”, shows a slight DNA kink in the 
to Mg”*. a, View of the symmetry mates (grey) contacting the non-catalytic structure bound to Mg”* (dotted box). This region contacts a-helix 7 of a 
Cas1 subunits (green). Catalytic Cas] subunits are shown in blue, Cas2 in symmetry mate, as described in the text. 
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Extended Data Table 1 | Summary of X-ray crystallography data collection and refinement 
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Without Mg With Mg Splayed substrate 

Data collection 
Space group P24242, P24242, P24242, 
Cell dimensions 

a, b, c (A) 88.02, 120.01, 196.01 75.66, 165.93, 167.26 88.02, 123.01, 196.01 

a, B,y (°) 90, 90, 90 90, 90, 90 90, 90, 90 
Resolution (A) 49.00-3.20 (3.36 —3.20) 46.41-2.95 (3.06-2.95) 48.9-3.35 (3.42-3.35) 
Rmerge (%) 30.8 (146) 19.6 (157) 28.5 (126) 
Roim (Y%) 12.8 (61.4) 10.8 (86.3) 21.6 (94.3) 
Ilo 6.4 (1.5) 9.8 (1.4) 5.0 (1.3) 
CCi/2 98.5 (72.4) 99.3 (42.0) 98.3 (72.7) 
Completeness (%) 99.8 (99.0) 100 (99.9) 99.6 (97.7) 
Redundancy 6.7 (6.6) 7.9 (8.0) 4.1 (4.0) 
Wilson B factor (A’) 63.8 64.0 73.7 
Refinement 
Resolution (A) 49.00-3.20 46.41-2.95 49.00-3.35 
No. reflections 35,808 (3,502) 44,960 (4,418) 31,049 (2885) 
Rwork!Riree 24.2/27.0 23.0/25.4 23.2/27.4 
No. atoms 

Protein 9,375 9,576 9,375 

DNA 1,142 1,142 1,165 

Metal 0 4 0 
Average B-factors (A*) 

Protein 65.9 66.6 86.6 

DNA 76.2 67.2 103.0 

Metal 51.6 
R.m.s deviations 

Bond lengths (A) 0.003 0.003 0.004 

Bond angles (°) 0.72 0.75 0.81 
Ramachandran statistics (%) 

Favored 96.0 95.0 96.0 

Allowed 3.75 4.51 3.58 

Outliers 0.25 0.49 0.42 


One crystal was used for each structure. 
Highest resolution shell is shown in parenthesis. 
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Endoperoxide formation by an a-Ketoglutarate- 
dependent mononuclear non-haem iron enzyme 


Wupeng Yan!**, Heng Song**, Fuhang Song*, Yisong Guo®, Cheng-Hsuan Ww’, Ampon Sae Her?, Yi Pu, 
Shu Wang’, Nathchar Naowarojna*, Andrew Weitz°, Michael P. Hendrich®, Catherine E. Costello**, Lixin Zhang’, 


Pinghua Liu? & Yan Jessie Zhang! 


Many peroxy-containing secondary metabolites’” have been 
isolated and shown to provide beneficial effects to human health?>. 
Yet, the mechanisms of most endoperoxide biosyntheses are not 
well understood. Although endoperoxides have been suggested 
as key reaction intermediates in several cases**, the only well- 
characterized endoperoxide biosynthetic enzyme is prostaglandin 
H synthase, a haem-containing enzyme’. Fumitremorgin B 
endoperoxidase (FtmOx1) from Aspergillus fumigatus is the first 
reported a-ketoglutarate-dependent mononuclear non-haem iron 
enzyme that can catalyse an endoperoxide formation reaction!” !”, 
To elucidate the mechanistic details for this unique chemical 
transformation, we report the X-ray crystal structures of FtmOx1 
and the binary complexes it forms with either the co-substrate 
(a-ketoglutarate) or the substrate (fumitremorgin B). Uniquely, 
after a-ketoglutarate has bound to the mononuclear iron centre in 
a bidentate fashion, the remaining open site for oxygen binding and 
activation is shielded from the substrate or the solvent by a tyrosine 
residue (Y224). Upon replacing Y224 with alanine or phenylalanine, 
the FtmOx1 catalysis diverts from endoperoxide formation 
to the more commonly observed hydroxylation. Subsequent 
characterizations by a combination of stopped-flow optical 
absorption spectroscopy and freeze-quench electron paramagnetic 
resonance spectroscopy support the presence of transient radical 
species in FtmOx1 catalysis. Our results help to unravel the novel 
mechanism for this endoperoxide formation reaction. 

The verruculogen biosynthetic gene cluster was identified through 
bioinformatic analysis'®, and its chemical scaffold is assembled by a 
non-ribosomal peptide synthetase, followed by several tailoring reac- 
tions. Among them, the FtmOx1-catalysed endoperoxide formation 
reaction is the most notable (Fig. 1a). Recent biochemical character- 
izations indicate that, unlike prostaglandin H synthase, FtmOx] is 
an a-ketoglutarate (a-KG)-dependent mononuclear non-haem iron 
enzyme!"”. Further characterization indicates that molecular oxygen 
(O,) is incorporated into verruculogen without O-O bond scission", 
distinguishing FtmOx1 from all currently known a-KG-dependent 
mononuclear non-haem iron enzymes!*""”, 

To unravel the mechanistic details of FtmOx1 catalysis, we first char- 
acterized the FtmOx1-a-KG complex using anaerobically purified and 
Fe?+-reconstituted FtmOx1 (FtmOx1-Fe"). Upon mixing the reconsti- 
tuted enzyme with a-KG under anaerobic conditions, a pink species 
appeared (pink trace, extinction coefficient €529 of ~166M~! cm}, 
Fig. 1b). The dissociation constant (Kg) of this species was 
~185 +35 uM (Extended Data Fig. 1a), close to the Kg values of 
Fe"-a-KG complexes of other mononuclear non-haem iron enzymes 
(for example, that of TauD)'®. Upon exposure to O, and in the absence 
of the substrate fumitremorgin B (1), the pink species faded and a 


blue chromophore with a Aax at ~600 nm developed within 30 min 
(blue trace, Fig. 1b). Tandem mass spectrometry (MS/MS) analysis of 
the blue species indicated the oxidation of Y224 to dihydroxypheny- 
lalanine (DOPA, Fig. 1c), which is the result of a self-hydroxylation 
reaction as observed in other mononuclear non-haem iron enzymes”. 
Notably, the presence of the substrate fumitremorgin B (1) prevented 
the FtmOx1 self-hydroxylation reaction (Extended Data Fig. 2). All 
of these properties are consistent with the formation of the FtmOx1- 
Fe"—a-KG complex. 

In previous studies, ascorbate was included as an additional reduct- 
ant!!!?, although its role in FtmOx! catalysis was not known'!!”, We 
observed that FtmOx1] is capable of catalysing fumitremorgin B (1) 
oxidation in the absence of ascorbate (Fig. 1d and Extended Data 
Fig. 3). Ata fixed FtmOx1 -Fe":fumitremorgin-B ratio of 1:1.5 and with 
an excess of Oz, the amount of product increased with the amount 
of a-KG until the a-KG:FtmOx1-Fe" ratio reached 1.0 (Fig. le and 
Extended Data Fig. 4). In contrast, when the O2:FtmOx1-Fe" ratio 
was below 1.0, only a small amount (~0.2 equivalent) of product was 
formed. Above 1.0, the amount of product increased with the increasing 
amount of O>, and plateaued when the O2:FtmOx1-Fe" ratio was >2.0 
(Fig. 1f and Extended Data Fig. 4). These results strongly suggest that 
each FtmOx1-catalysed turnover consumes one equivalent of a-KG 
and two equivalents of O2. Unexpectedly, under our assay conditions, 
compound 3 rather than verruculogen (2) was the dominant product 
(Fig. le, fand Supplementary Information). 

We determined the FtmOx1 crystal structure at 1.95 A resolution 
with the phase derived from selenomethionine-labelled FtmOx1 using 
the single-wavelength anomalous dispersion method. FtmOx1] folds as 
a ‘jelly roll; a prevalent fold in mononuclear non-haem iron enzymes 
(Fig. 2a)'*. Two molecules in each asymmetric unit form a functional 
dimer, consistent with our size-exclusion chromatography profile 
and previous literature reports!'!*, The dimer interface (2,461.6 A”) 
accounts for 17.1% of the FtmOx1 surface. The active site pocket at the 
dimer interface has a volume of 222.6 A3, as calculated by the DogSite 
Server”’. This spacious pocket is partitioned into two parts: a hydro- 
philic region where the non-haem iron centre is located and a hydro- 
phobic pocket formed by L64, F115, and F233 from one monomer with 
1267 and V268 from the other (Fig. 2b). 

H129, H205, D131, and three well-ordered water molecules form 
an approximate octahedral coordination to the mononuclear iron 
(Fig. 2c). One of the water ligands is hydrogen-bonded to Y224, 
whose proximity to the mononuclear iron centre (~4.4 A) explains 
the formation of DOPA in the FtmOx! self-hydroxylation reaction 
(Fig. 1c). Co-crystallization or soaking of the co-substrate a-KG led 
to an identical FtmOx1-Fe"-a-KG complex, in which a-KG binds to 
the iron centre in a bidentate fashion by replacing two water molecules 
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Figure 1 | Enzymatic characterization of wild-type FtmOx1. a, Proposed with various amounts of a-KG. Identities of the peaks were assigned 
FtmOx] reaction. b, Formation of the FtmOx1-Fe"-a-KG binary complex based on nuclear magnetic resonance (NMR) and high-resolution mass 
under anaerobic conditions (pink trace). Self-hydroxylation reaction spectrometry (see Supplementary Information). f, O2 stoichiometry analysis 
upon exposure of the binary complex to O2 (blue trace). c, Electrospray in FtmOx1 catalysis. HPLC chromatograms of FtmOx] reactions contained 
ionization MS/MS analysis of the blue species in b is consistent with the fumitremorgin B (360 uM), FtmOx1 (240 uM), and a-KG (480 uM) when 
oxidation of Y224 to DOPA224. d, Products formed in the FtmOx] reaction. variable amounts of oxygen-saturated buffer were added to initiate the 
e, a-KG stoichiometry analysis in FtmOx1 catalysis. High-performance reaction. 


liquid chromatography (HPLC) chromatograms of FtmOx] reactions 


(Fig. 2d). In the FtmOx1-Fe"-a-KG complex, the 2-keto group of On the basis of the strategic positioning of Y224 in the FtmOxl active 
a-KG coordinates to the iron centre trans to D131. Its 1-carboxylate _ site, we next examined the role it plays in FtmOx] catalysis. We char- 
group binds trans to H205, which is the distal histidine of the 2-His-1- acterized two Y224 variants, Y224A- and Y224F-substituted FtmOx1. 
carboxylate facial triad (Fig. 2d and Extended Data Fig. 5a)'*. In this The enzyme-a-KG complexes of both variants exhibit Kj values close 
FtmOx1-Fe"-a-KG complex, the remaining water ligand (a potential __ to that of the wild-type FtmOx1 (Extended Data Fig. 1b, c). However, 
site for O2 binding and activation) is completely shielded from sol- the product profiles of both variants were very different (Fig. 3). The 
vent or substrate by Y224 (Fig. 2d and Supplementary Video 1).In Y224A-substituted variant produced a mixture of at least five detecta- 
contrast, in most reported structures of enzyme-a-KG complexes, the _ ble products with mainly dealkylation products (compounds 4, 5 and 
1-carboxylate of a-KG coordinates trans to the proximal histidine of the 6). Endoperoxides (2 and 3) only account for ~15% of the product 
facial triad motif’, and the remaining open site for Oz binding and acti- population (Fig. 3a, b). 
vation directly points towards the substrate. As a result, the oxoferryl The Y224F-substituted variant also produced endoperoxides (2 
(Fe'’=O) species produced from oxygen activation is accessible tothe and 3) and dealkylation products (4 and 5). In addition, there were 
substrate for oxidative transformations (for example, TauD in Fig.2e more endoperoxides (2 and 3) formed by the FtmOx1(Y224F) variant 
and Supplementary Video Dyn relative to the FtmOx1(Y224<A) variant (~35% versus ~15% of the 
To examine whether Y224 changes location upon substrate product mixture, Fig. 3a). For the FtmOx1(Y224F)-Fe"-a-KG com- 
binding, we also solved the structure of the FtmOx1l-Fe"- plex in the absence of the substrate fumitremorgin B (1), exposure to 
fumitremorgin-B complex at a resolution of 2.2 A (Fig. 2f). The loca- OQ» caused the complex to slowly change colour to blue, which implies 
tion of the positive density is consistent for all data sets collected DOPA formation (Extended Data Fig. 6a). DOPA formation can be 
for this complex by either co-crystallization or soaking (>15 data explained by two sequential hydroxylation steps (F224— Y224 and 
sets). Substrate is modelled into the density at the active site with © Y224—>DOPA224). Indeed, this conclusion was supported by MS/MS 
an average occupancy of ~60% owing to the high hydrophobicity analysis of this variant (Extended Data Fig. 6b-e). Thus, the higher 
of fumitremorgin B (Extended Data Fig. 5b). In this complex, Y224 _ level of endoperoxides (2 and 3) produced by the FtmOx1(Y224F) var- 
adopts a conformation identical to that observed in the FtmOx1 _ iant is probably attributed to the conversion of the variant to wild-type 
alone (Fig. 2c) and FtmOx1-Fe"-a-KG complexes (Fig. 2d). Rings | FtmOx1, which provides further evidence supporting the key role of 
A and B of the substrate form m-m stacking with the Y224 side chain Y224 in FtmOx1 catalysis. 
at a distance of ~3.3 A (Fig. 2f). Superimposition of the struc- Mononuclear non-haem iron enzymes catalyse a wide range of 
tures of the FtmOx1-Fe"-a-KG complex onto the FtmOx1-Fe"— _reactions'*"!”. Recently, unique transformations have been reported 
fumitremorgin-B complex revealed that the side chain of Y224 effec- | which demonstrate the functional versatility of this class of enzymes, 
tively separates the potential O, binding site from the substrate binding including oxidative dehydrogenation in epoxide formation”’, chlo- 
pocket (Fig. 2g, Extended Data Fig. 5c and Supplementary Video 1). _ rination***, epimerization*””®, and C-C bond cleavage”’. FtmOx1 
This is in notable contrast to TauD, in which the oxygen binding and _ provides a further example of this diversity'!’*. On the basis of our 
activation site directly faces the substrate (Extended Data Fig. 5dand structural and biochemical information, we propose a preliminary 
Supplementary Video 2)71. FtmOx1 mechanistic model (Fig. 4). After a-KG and substrate binding, 
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Figure 2 | Structures of FtmOxt1. a, Overall architecture of FtmOx1 shown 
as a functional dimer with one monomer colour-coded based on secondary 
structures (shown as stereo images). The iron centre is labelled as a grey 
sphere. b, FtmOx] active site shown in the electrostatistic mode. c, FtmOx1 
metallo-centre electron density (2mF,— DF, map) at 1o contour. The 
coordination of iron is represented by dashed lines. d, FtmOx1-Fe"-a-KG 
binary complex. The a-KG molecule was modelled into a composite omit map 
(mF, — DF, map) contoured to 2.80. The coordination of iron is represented 
by dashed lines with distances labelled (units, A).e, a-KG binding mode of 
TauD (PBD accession code 10S7). TauD is shown in an identical orientation 
relative to that of FtmOx1 in d to highlight their differences in active site 
topologies. f, Structure of the FtmOx1-Fe"-fumitremorgin-B complex. 

g, Superimposition of the binary structures of FtmOx1-Fe"-a-KG and 
FtmOx1-fumitremorgin-B. Y224 is highlighted in pink. 


the first molecule of O; is activated to produce an Fe'’=O species (spe- 
cies B) by a mechanism similar to that of other members of this class 
of enzymes”®. Uniquely, in FtmOxl, because the O, activation site is 
shielded from the substrate by Y224, direct oxidation of the substrate 
by the Fe'’ =O species is less likely. Instead, the Fe'’=O species oxidizes 
Y224 to a tyrosyl radical (species C), which then removes a hydrogen 
atom from the fumitremorgin B C21 position to form a substrate-based 
radical (species D). A second molecule of O; reacts with species D to 
form a peroxyl radical (species E). It then reacts with the other prenyl 
arm to produce the endoperoxide along with the formation ofa carbon 
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Figure 3 | Characterization of Y224A- and Y224F-substituted FtmOx1. 
a, HPLC profiles of reactions from the two FtmOx] variants, Y224A and 
Y224F. Both traces were conducted in reaction mixture containing FtmOx1 
Y224 variants (240 uM), fumitremorgin B (240 1M), a-KG (720 uM) and 
Op (480 1M). Trace I, HPLC chromatograms of the reaction using Y224A- 
substituted FtmOx1; trace II, HPLC chromatograms of the reaction using 
Y224F-substituted FtmOx1. Note that a new column was used for the mutant 
analyses relative to the one in wild-type FtmOx1 characterizations, which 
led to the differences in retention times relative to the other HPLC traces. 
b, Products formed in reactions using either Y224F- or Y224A-substituted 
FtmOx] variants. The compounds were characterized using NMR and MS 
(see Supplementary Information). 


centre radical at the C26 position (species F). Species F can re-oxidize 
Y224 to a tyrosyl radical (species G). Starting from species G, two path- 
ways are possible. FtmOx! can follow a mechanism similar to prosta- 
glandin synthase H (ref. 9) in which once the tyrosyl radical is formed, 
multiple cycles of endoperoxide formation can be mediated through 
this radical (pathway I, Fig. 4). However, the production of compound 3 
in the FtmOx1 reaction points to another possibility (pathway II, 
Fig. 4), in which the two electrons provided by the 2—3 oxidation 
process reduce both Fe** and the tyrosyl radical to the resting state of 
FtmOx!1 (species A). 

The formation of a small amount of endoperoxides in Y224A- and 
Y224F-substituted FtmOx1 may be due to two competing pathways: 
the hydroxyl-rebound and the endoperoxide formation pathways 
(Extended Data Fig. 7). After Fe'’=O is formed, it may directly remove 
a hydrogen atom from the fumitremorgin B (1) C21 position to forma 
substrate-based radical (species C’, Extended Data Fig. 7). Subsequent 
rebound by the hydroxyl radical will lead to the formation of hydrox- 
ylation products (pathway I’, Extended Data Fig. 7). Decomposition 
of the hydroxylation reaction product forms compounds 4 and 5. At 
the same time, the substrate-based radical (species C’) may be trapped 
by a second molecule of O2, which leads to endoperoxide formation 
(pathway II’, Extended Data Fig. 7). 

To gain evidence supporting the presence of radical species in FtmOx1 
catalysis as outlined in our FtmOx1 mechanistic model (Fig. 4), we 
conducted spin-trapping experiments using 5,5-dimethyl-1-pyrroline 
n-oxide (DMPO) as the reagent. In the presence of 50 equivalents of 
DMPO, further oxidation of verruculogen 2 to 3 was markedly sup- 
pressed, and 2 was the dominant product (Fig. 5a). This result provides 
evidence supporting the involvement of radicals in FtmOx] catalysis. 
Next, the FtmOx1 reaction was monitored with stopped-flow optical 
absorption spectroscopy (Fig. 5b). The UV-visible spectrum of the 
solution generated after rapid mixing of O2-saturated buffer with 
the FtmOx1-Fe"-fumitremorgin-B-a-KG complex demonstrated the 
accumulation of a transient species centred at ~420nm. The amount 
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Figure 4 | Proposed FtmOx1 mechanistic model. The oxygen-oxygen bonds shown in blue highlight the incorporation of endoperoxide into the substrate 


fumitremorgin B. 


of this species maximized at ~0.2s and then decayed within ~3 s. This 
wavelength differs from the tyrosyl radicals observed in ribonucleo- 
tide reductase” and another reported a-KG-dependent iron enzyme, 
CarC”® (a peak at 410 nm with a shoulder at 390 nm). Chemical quench 
experiments performed under the same conditions indicated that the 
consumption of substrate 1 and the formation of products (2 and 3) 
occurred on the timescale of seconds per cycle (Extended Data Fig. 8a), 
suggesting that the 420 nm species observed in the stopped-flow optical 
absorption spectroscopy experiments is a kinetically competent inter- 
mediate. FtmOx1 catalysis was then investigated by rapid freeze-quench 
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Figure 5 | Evidence for transient radical species in the reaction pathway. 

a, HPLC chromatograms of FtmOx] reaction under three different 
conditions. The reaction mixture contained FtmOx1 (240 uM), 
fumitremorgin B (200 uM), a-KG (300 uM), and was initiated with O- 
saturated buffer. Trace I, FtmOx1 reaction; trace II, FtmOx] reaction in the 
presence of 10 mM DMPO; trace III, FtmOx1 substrate alone. b, Absorbance 
changes upon mixing the O2-saturated buffer with the reaction mixture 

in 100mM Tris-HCl (pH 7.5) buffer containing FtmOx1 (0.65 mM), Fel! 
(0.58 mM), fumitremorgin B (0.58 mM) and a-KG (12 mM). The decay of the 
Fe-a-KG complex charge transfer band centred at ~520 nm (dashed arrow) 
and the formation and decay of the spectral feature centred at ~420 nm 


542 | NATURE | VOL 527 | 26 NOVEMBER 2015 


in conjunction with electron paramagnetic resonance (EPR) spectros- 
copy. Two EPR signals were observed at 0.01 s (earliest possible time 
on instrument) and were highest at ~0.2s after the rapid mixing of 
O,-saturated buffer with the FtmOx1-Fe"-fumitremorgin-B-a-KG 
complex (Extended Data Fig. 9a). The first EPR signal with resonances 
at g= 4.54, 4.26, and 3.93 (Extended Data Fig. 9b) belongs to a high- 
spin Fe** species having axial and rhombic zero-field splitting param- 
eters of |D| <0.5cm7! and E/D~ 0.26, respectively. These parameters 
are not typical of adventitious Fe**. The second EPR signal was in 
the g=2 region and most likely belongs to a radical species (Fig. 5c 
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(arrow) are highlighted. Inset: time-dependent absorbance change at 420 nm. 
The absorbance reported in b was obtained by blanking the spectrometer 
with the anaerobic buffer containing 100 mM Tris-HCl (pH 7.5). The 
absorbance reported in the inset was obtained by subtracting the absorbance 
at 420 nm of the 2 ms spectrum from all other spectra recorded. The trace 

is the average of two trials. c, Spectroscopic evidence for transient radical 
species. X-band EPR spectra measured at 19 K of reaction samples freeze- 
quenched at the indicated time points. Measurement conditions: microwave 
frequency, 9.64 GHz; microwave power, 2 1.W; modulation amplitude, 1 mT; 
and modulation frequency, 100 kHz. 
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and Extended Data Fig. 9c). The formation and decay of this radical 
signal closely followed the kinetics of the 420 nm absorption feature 
observed in stopped-flow optical absorption spectroscopy experi- 
ments (Extended Data Fig. 8b), indicating that they are from the same 
intermediate species. Spin quantification of the EPR signals at ~0.2s 
revealed that the Fe** and radical species accumulated to ~0.35 and 
~0.25 equivalents, respectively. The width of the radical EPR signal 
(~12 mT edge-to-edge width, Fig. 5c) was significantly broader than 
that of magnetically isolated organic or protein radical signals”’. Such 
broadening could be due to a magnetic dipolar interaction of the rad- 
ical species with an adjacent spin centre, most likely the Fe** centre 
depicted in species D, E or F in Fig. 4. 

In summary, our FtmOx1 structural and biochemical characteri- 
zation provides a notable example of the catalytic versatility of mono- 
nuclear non-haem iron enzymes and how changes in the secondary 
coordination sphere to the non-haem iron facilitate unprecedented 
chemical transformations. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Materials and experimental procedures. Fumitremorgin B was isolated from 
Aspergillus fumigatus strain IM-MF330 according to the procedure summarized 
in a later section. All reagents were purchased from Sigma-Aldrich unless other- 
wise stated. 

Nuclear magnetic resonance (NMR) spectra were obtained on a Bruker Avance 

DRX600 spectrometer in the solvents indicated and referenced to residual 'H and 
8C signals in deuterated solvents. High-resolution electrospray ionization (ESI) 
mass spectrometry (MS) measurements were obtained on a Bruker micrOTOF 
mass spectrometer. High-performance liquid chromatography (HPLC) was per- 
formed using an Agilent 1200 Series separations module equipped with Agilent 
1200 Series diode array detectors and an Agilent 1200 Series fraction collector, 
controlled using ChemStation. UV-vis analysis was performed on a Varian Cary 
100 Bio UV-vis spectrophotometer. 
Sub-cloning and overexpression of wild-type, Y224F- and Y224A-substituted 
FtmOx1. The coding sequence of the FtmOx1 gene from A. fumigatus Af293 
(accession number: XM_742088) was sub-cloned into the EcoRI and XhoI 
restriction sites of the pASK-IBA3* vector, which places it under the control 
of the tet-promoter and allows for the production of C-terminally strep-tagged 
FtmOx1. The final recombinant FtmOx1 includes some extra amino acid residues 
at the N terminus and a strep-tag at the C terminus for purification. The residue 
numbering used in this manuscript is based on the FtmOx1 sequence deposited 
in GenBank (accession number: XM_742088). Y224F- and Y224A-substituted 
FtmOx1 were generated using a Stratagene QuikChange II kit according to the 
manufacturer's instructions. 

Plasmids encoding wild-type, Y224F-, and Y224A-substituted FtmOx1 mutant 
genes were used to transform Escherichia coli BL21 (DE3) cells (Invitrogen Inc.) 
for protein overexpression. A single colony was used to inoculate a starter cul- 
ture, which was incubated at 37°C overnight. Production cultures were grown 
at 37°C in Luria-Bertani medium supplemented with 100g ml”! ampicillin to 
an optical density (OD¢o9) of ~0.8 and then cooled to 25°C. The FtmOx1 protein 
production was induced by the addition of anhydrotetracycline to a final con- 
centration of 250g 1~!. The cultures were grown at 25°C for an additional 16 h 
before harvesting. 

Purification was performed at 4°C. In a typical purification, ~30 g wet cell paste 
was resuspended in 100 ml of anaerobic buffer (100 mM Tris-HCl, 50 mM NaCl, 
and 5 mM 1,10-phenanthroline (pH 7.5)) in an anaerobic Coy chamber. Lysozyme 
(1.0mg ml! final concentration) and DNase I (100 U per gram of cell) were then 
added into the cell suspension, and the mixture was incubated on ice for 40 min 
with gentle agitation. The cells were disrupted by sonication (20 cycles of 10s 
bursts) using a Fisher Scientific Model 505 Sonic Dismembrator. The supernatant 
and the cell debris were anaerobically separated by centrifugation at 4°C for 30 min 
at 20,000g. Streptomycin sulfate was added into the supernatant (~100 ml) toa 
final concentration of 1% (w/v), and the mixture was incubated on ice for 30 min 
with gentle agitation. The DNA precipitate was then removed by centrifugation at 
20,000g for 40 min at 4°C. The resulting supernatant was mixed with Strep-Tactin 
resin (50 ml) and incubated on ice for 30 min. After the cell lysate was drained 
by gravity, the column was washed with washing buffer (100 mM Tris-HCl and 
150mM NaCl (pH 7.5)) until the OD269 was <0.05. The FtmOx1 protein was 
reconstituted by incubating the protein-loaded resin with 50 ml of a solution con- 
taining 3.0mM ammonium ferrous sulfate and 5.0 mM ascorbate at 4°C for 10 min. 
After the excess solution was drained by gravity, the resin was further washed with 
washing buffer until the OD2g9 was <0.05. Recombinant FtmOx1 was eluted with 
elution buffer (2.5mM desthiobiotin in 100 mM Tris-HCl and 50mM NaCl (pH 
7.5)). The eluted protein was concentrated, flash frozen with liquid nitrogen, and 
stored at —80°C. From 30g of wet cell paste, ~400 mg of protein was obtained. The 
purity of the protein was shown by SDS-PAGE (12%) as a single band. The FtmOx1 
concentration was calculated using é2g) nm of 43,288 M~! cm~! determined by 
amino acid analysis. 

Selenomethionine-incorporated FtmOx1 was prepared using a modified 
medium. A single colony was used to inoculate 50 ml Luria-Bertani medium 
supplemented with 100 pgml! ampicillin, which was incubated at 37°C until the 
ODge00 was ~0.5. The pre-culture (2 ml) was transferred into 150 ml of minimal 
media (1 1 minimal media contained 50 ml glycerol, 12.8 g Nagy HPO4-7H20, 3 g 
KH POg, 0.5g NaCl, 1 g NH4Cl, 0.2% glucose, 0.1 mM CaCh, and 2.0 mM MgSO.) 
supplemented with 100 ug ml~! ampicillin, which was incubated at 37°C for an 
additional 5h. Then, 10 ml of the pre-culture was transferred into 1 1 minimal 
media supplemented with 100 ug ml“! ampicillin and incubated at 37°C for 12h. 
Subsequently, 10 ml 100 amino acid solution mix (100 amino acid solution 
contained 100 mg lysine, 100 mg threonine, 100 mg phenylalanine, 50 mg leucine, 
50 mg isoleucine, and 50 mg valine in 10 ml H,O) and 100 selenomethionine 
solution (60 mg L-selenomethionine in 10 ml HO) were added to the culture 


medium. After 0.5h, the temperature was decreased to 25°C, and FtmOx1 over- 
expression was induced by the addition of anhydrotetracycline to a final concen- 
tration of 250g 1-1. The cultures were grown at 18°C for an additional 12h before 
harvesting. 

Selenomethionine-incorporated FtmOx1 was purified according to the same 
procedure described earlier. From 5 g of wet cell paste, ~15 mg of selenomethionine- 
incorporated FtmOx1 was obtained. 

Before crystallization, FtmOx1 was further purified by gel filtration (Superdex 
200, GE Healthcare) in buffer containing 100 mM Tris-HCl at pH 7.5 and 50 mM 
NaCl. After gel filtration, FtmOx1 was concentrated to ~10 mg ml and stored at 
—80°C for future crystallization experiments. 

Isolation of fumitremorgin B. Aspergillus fumigatus strain IM-MF330 was 
isolated from a mud sample collected from the Yellow Sea. A small number of 
spores growing on a potato dextrose agar slant was inoculated into a 250-ml 
conical flask containing 40 ml of liquid medium (20% potato infusion, 2.0% glu- 
cose, 3.5% sea salt, and distilled water) and then cultured at 28°C for 3 days on 
a rotary shaker at 160 rpm. The seed culture (5 ml) was inoculated into 1,000-ml 
conical flasks, each containing 130 g rice and 80 ml artificial seawater, and incu- 
bated without aeration for 19 days. The fermentation product was exhaustively 
extracted with EtOAc:MeOH (80:20) to yield a crude extract. The crude extract 
was partitioned between EtOAc and H,0. The EtOAc layer (10.4 g) was applied 
to a column of silica gel using a gradient solvent system of 50-100% petroleum 
ether/CH,Cl, and 0-100% MeOH/CH>Cl, to afford 15 fractions. Fraction MF330F 
was passed through a Sephadex LH-20 column and eluted with petroleum-ether: 
CH)Cly:MeOH (5:5:1) to yield 5 sub-fractions. The third fraction MF30F3 was 
subsequently subjected to HPLC fractionation (Agilent Zorbax SB-C18 5 um 
250 x 9.4mm column, 3.0 ml min~', 65% MeOH) to yield verruculogen and 
fumitremorgin B, respectively. 

Crystallization and data collection. FtmOx1 crystallization was set up using 
the sitting-drop vapour diffusion method by mixing protein and crystallization 
buffer (100 mM MES (pH 6.5), 50mM CoCh, and 2M ammonium sulfate) at a 
ratio of 2:1 at room temperature. Sheet-like crystals were visible after 7 days. The 
FtmOx] and a-ketoglutarate (a-KG) complex was obtained using both soaking 
and co-crystallization methods, which led to identical models. Crystal soaking 
was conducted by transferring the pre-formed FtmOx1 crystals into crystallization 
mother liquor containing 1 mM a-KG and incubated for 2h at room temperature. 
Co-crystallization trials included pre-mixture of the protein with a-KG at a ratio 
of 1:100 for 2h before crystallization setup. The crystals were cryoprotected by the 
addition of 25% glycerol in mother liquor before being vitrified in liquid nitro- 
gen for data collection. To obtain the structure of the FtmOx1-fumitremorigin-B 
complex, we crystallized FtmOx1 in an anaerobic chamber using identical con- 
ditions. Sheet-like crystals appeared within 3 days and continued to grow for 
another week before reaching maximal size. Fumitremorgin B was dissolved in 
buffer containing degased crystallization mother liquor with 0.05% TritonX-100 
and 20% glycerol to saturation. After centrifugation, to discard insoluble material, 
the mother liquor containing a saturating amount of fumitremorgin B was used to 
soak FtmOx] crystals as sitting drops for 90 min until cryoprotected with degassed 
mother liquor with 30% glycerol. 

Crystal diffraction data were collected at the Advanced Photon Source beam- 
line BL23-ID-B (Argonne, Illinois) for FtmOx1 (wavelength 0.97931 A) and 
selenomethionine-incorporated FtmOx]1 (wavelength 0.97958 A). The diffraction 
data for the FtmOx1-a-KG and FtmOx1-fumitremorgin-B binary complexes were 
collected at the Advanced Light Source beamline BL5.0.3 (Berkeley, California) at 
wavelength 0.97648 A. All data collection were conducted within liquid nitrogen 
stream at 100K. The data were processed using the program HKL2000*”. The 
statistics for data collection are summarized in Extended Data Table 1. 
Structure determination and refinement. The FtmOx1 structure was deter- 
mined by the single anomalous dispersion method using the selenomethionine 
data set with phase information to 3.5 A resolution. The positions of the sele- 
nium were determined and refined by Phenix.Autosol*!~** followed by the density 
modification program DM in CCP4 suite**. An initial model was built based 
on the phase information using the Buccaneer program‘, further extended 
and corrected manually by the COOT program“. The resolution was extended 
to the high-resolution limit of 1.95 A using the native protein data set. Iterative 
cycles of optimization were performed to improve the quality of the model using 
the refinement program PHENIX.Refine?°°, followed by manual rebuilding in 
COOT*. A portion of the of diffraction data (5%) was reserved as an unbiased test 
set for cross validation (Rfee) for the model that eventually had an Ryork of 16.1% 
and an Rfee of 19.9%. The structure of the FtmOx1l-a-KG binary complex and 
FtmOx1-fumitremorgin-B complex were both solved by molecular replacement 
with the FtmOx] structure as the initial model using Phaser in the CCP4 pack- 
age*®°*>°, The co-substrate a-KG and substrate fumitremorgin B were built using 
COOT followed by several rounds of refinement by PHENIX.Refine*?>. For the 
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FtmOx1-a-KG complex, the final Ryork was 20.3% with an Rfree of 26.0%. For the 
FtmOx1-fumitremorgin-B complex, the final Ryork was refined to 16.7% and Réree 
to 20.3%. Model quality for all of the structures was evaluated with Ramachandran 
and MolProbity*!. The structures show no outlier and most residues were in the 
favoured region of Ramachandran statistics (98.6% for apo FtmOx1, 98.1% for 
the FtmOx1-a-KG complex and 98.3% for FtmOx1-fumitremorgin-B). When 
evaluated by MolProbity, all three structures rank 100% within the specific resolu- 
tion range (apo FtmOx1 with a MolProbity score of 1.04, FtmOx1-a-KG complex 
1.26 and FtmOx1-fumitremorgin-B 1.12, respectively). Refinement statistics are 
summarized in Extended Data Table 1. Figure 2 and Extended Data Fig. 5 were 
prepared with PyMol®. 

Oxygen concentration determination for oxygenated buffer. Oxygen-saturated 
buffer®! (10 ml) was transferred into syringes (12 cm?) with a long needle, and 
the syringe was then sealed. An alkaline KI solution (2.1 M KI and 8.7 M KOH 
prepared using oxygen-free water) and MnSO, solution (2.1 M) in oxygen-free 
water were prepared in a Coy chamber and transferred out using syringes sealed 
by a rubber septum. The alkaline KI solution (0.2 ml) and the MnSO, solution 
(0.2 ml) were quickly aspirated into the syringe containing the 10 ml oxygen satu- 
rated buffer. Then, the syringe was quickly sealed again. The sample in the syringe 
was intensely mixed (turning the syringe ~10 times upside-down until the entire 
syringe was filled with the floating Mn(OH); precipitate); the Mn(OH)); precipitate 
formed completely in 45 min according to the following reaction: 


4Mn** + O, + 80H” +2H,O — 4Mn(OH), | (1) 


After 45 min, H2SO, solution (0.2 ml, 2.7 M) was aspirated into the syringe, 
and Mn** ions oxidized iodide to iodine under acidic conditions according to 
the following reactions: 


2Mn(OH);(s) + 3H,SO, > 2Mn** + 3807° + 3H,O (2) 


2Mn** + 21” > 2Mn** +1, (3) 
Iodine eventually formed I;~ ions with the excess KI: 
,+T +1" (4) 


The resulting iodine solution was transferred to a sample bottle and immediately 
titrated with standardized 2.5 mM Na,S03 solution: 


I, +28,03° 31 +$,027 (5) 


According to reactions (1)-(5), one equivalent of oxygen molecule corres- 
ponds to four equivalents of Na2S.O3. Therefore, the oxygen concentration in the 
oxygen-saturated buffer was determined based on the amount of standardized 
2.5mM Na2S2O3 solution used for titration. 

The Na2S203 concentration was standardized with an iodine solution, which 
was prepared by mixing a standard KIO; solution and KI solution under acidic 
conditions 


(KIO, + 5KI+ 6H* — 31, +6K*+3H,O; L4+1 -1,; 
I; +28,037 > 3 +$,027) 


a-KG and oxygen stoichiometries in the FtmOx1 catalysis. To examine whether 
FtmOx1 is capable of catalysing verruculogen oxidation in the absence of other 
reductants, the FtmOx1 reaction was conducted under the following condi- 
tions: a 200 ul anaerobic mixture in 100 mM Tris-HCl (pH 7.5), contained fumi- 
tremorgin B (360M), a-KG (4mM), and variable amounts of iron-loaded FtmOx1 
(0.25x,0.5x, 1x, and 2.0 of iron-loaded FtmOx! relative to the fumitremorgin 
B concentration). The reaction was initiated by quickly mixing the above solu- 
tion with 200 ul oxygen-saturated buffer (1.2 mM) in the Coy chamber to make a 
solution containing 600 uM oxygen. The final reaction mixture contained 180 1M 
fumitremorgin B, 2mM a-ketoglutarate, 600 uM oxygen, and a variable amount of 
iron-loaded FmOx] (0.25x,0.5x, 1x, and 2x of FtmOx! relative to that of verruc- 
ulogen concentration). After the reaction was initiated, the reaction mixture was 
sealed and incubated for 0.5h at 37°C. The enzymatic reaction was quenched by 
adding 300 ul chloroform, the precipitated protein was removed by centrifugation 
at 13,000g for 10 min, and the chloroform layer was carefully removed. The reaction 
mixture was extracted one more time using a second 300-l volume of chloroform. 
The combined chloroform layers were concentrated by rotatory evaporation, and 
the residue was re-dissolved in 100 ul acetonitrile and subjected to HPLC analysis. 

To determine a-KG stoichiometry, 200 ul anaerobic reaction mixture 
(in 100 mM Tris-HCl (pH 7.50)) contained 360 uM fumitremorgin B, 240 uM 
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iron-loaded FtmOx1, and variable amounts of a-KG. The concentration of a-KG 
was varied to make reaction mixtures containing 0.5x, 1.0x,1.5x, and 2.0x of 
a-KG relative to the FtmOx1 concentration. The reaction was initiated by quickly 
mixing 200 ul of oxygen-saturated buffer (1.2 mM) in the Coy chamber to make a 
final oxygen concentration of 600 uM. The resulting reaction mixtures contained 
a final concentration of 180 4M fumitremorgin B, 120 uM iron-loaded FtmOx1, 
600 uM oxygen, and variable amounts of a-KG. After initiation, the reaction was 
sealed and incubated for 0.5h at 37°C. The enzymatic reaction was quenched 
by adding 300 ul chloroform, the precipitated protein was removed by centrif- 
ugation at 13,000g for 10 min, and the chloroform layer was carefully removed. 
The reaction mixture was extracted one more time using a second 300-1 volume 
of chloroform. The combined chloroform layers were concentrated by rotary 
evaporation, and the residue was re-dissolved in 100 tl of acetonitrile and subjected 
to HPLC analysis. 

To determine oxygen stoichiometry, oxygen-saturated buffer was added to a 
600 ul anaerobic reaction mixture (100 mM Tris-HCl (pH 7.5) buffer, 360 uM of 
fumitremorgin B, 240 uM of iron-loaded FtmOx1, and 480 uM of a-KG). To deter- 
mine the amount of product formation under 1 x of oxygen relative to iron-loaded 
FtmOx1 concentration, the above reaction mixture was quickly mixed with 120 ul 
of oxygen-saturated buffer (1.2 mM) in the Coy chamber. To assess the amount 
of product formation under 2 of oxygen relative to iron-loaded FtmOx1 con- 
centration, the above mixture was quickly mixed with 240 ul of oxygen-saturated 
buffer (1.2 mM) in the Coy chamber. To determine the amount of production 
formation under 3 x of oxygen relative to iron-loaded FtmOx1 concentration, the 
above mixture was quickly mixed with 360 ul of oxygen-saturated buffer (1.2 mM) 
in the Coy chamber. After reaction initiation, the reaction mixtures were sealed and 
incubated for 0.5h at 37°C. The enzymatic reaction was quenched by adding 300 ul 
chloroform, the precipitated protein was removed by centrifugation at 13,000g for 
10 min, and the chloroform layer was carefully separated. The reaction mixture 
was extracted one more time using a second 300-ul volume of chloroform. The 
combined chloroform layers were concentrated by rotatory evaporation, and the 
residue was re-dissolved in 100 ul of acetonitrile and subjected to HPLC analysis. 
Reactions using Y224F-substituted FtmOx1. For the reactions using Y224F- 
substituted FtmOx1, the anaerobic reaction mixture (600 ul, in 100 mM Tris- 
HCl (pH 7.5)) contained 400 uM Y224F-substituted FtmOx1 containing 300 uM 
Fe", 400 uM fumitremorgin B, and 1,200 1M a-KG. The reaction was initiated 
by quickly adding 400 ul of oxygen-saturated buffer (1.2 mM) in the Coy cham- 
ber. The resulting reaction mixtures contained a final concentration of 240 uM 
fumitremorgin B, 240 uM Y224F-substituted FtmOx1 containing 192 uM Fe", and 
720 uM a-KG. After initiation, the reaction mixture was sealed and incubated for 
0.5h at 37°C. The enzymatic reaction was quenched by adding 300 l chloroform, 
the precipitated protein was removed by centrifugation at 13,000g for 10 min, and 
the chloroform layer was carefully separated. The reaction mixture was extracted 
once more using a second 300-ul volume of chloroform. The combined chloroform 
layers were concentrated by rotatory evaporation, and the residue was re-dissolved 
in 100 ul of acetonitrile and subjected to HPLC analysis. 

Reactions using Y224A-substituted FtmOx1. Y224A-substituted FtmOx1 was 
analysed by a procedure similar to that described in ‘Reactions using Y224F- 
substituted FtmOx1 except that Y224A-substituted FtmOx1 was used instead of 
Y224F-substituted FtmOx1. 

HPLC analysis of the FtmOx1 reaction products. Enzymatic reaction products 
were routinely analysed by HPLC using a Phenomenex reversed phase C18 column 
(250mm x 4mm, 5m; Phenomenex). A linear gradient of 30-100% (v/v) acetoni- 
trile in water was run for 30 min with a flow rate of 0.7 ml min“!, followed by 100% 
(v/v) acetonitrile for 5 min. Before the next injection, the column was equilibrated 
with 30% (v/v) acetonitrile for 2 min. The separation profile was monitored using 
a Photo Diode Array detector at 300 nm. 

Products of the reaction (compounds 2 and 3) using wild-type FtmOx1 were 
characterized by NMR and high-resolution mass spectrometry (see Supplementary 
Information). 

Isolating products from reactions using wild-type, Y224F-, or Y224A- 
substituted FtmOx1. To characterize the reaction products, a large-scale reac- 
tion (270 ml) was performed using purified FtmOx1 (41.1 1M) containing 32 uM 
Fe", fumitremorgin B (30.9 uM), and 60 uM a-KG in Tris-HCl buffer (50 mm, 
pH 7.0) at 30°C for 2h. Chloroform (300 ml) was added to the reaction mixture. 
The chloroform layer was transferred into centrifuge bottles and centrifuged at 
5,000 rpm for 10 min to remove precipitated proteins. The reaction mixture was 
extracted once more using a second 300-ml volume of chloroform. The combined 
chloroform layers were dried over Na2SO, for 0.5h and concentrated by rotary 
evaporation. The residue was subjected to HPLC separation on a C18 column 
(4.6 x 150mm). A linear gradient of 30-100% (v/v) acetonitrile in water was run 
for 25 min with a flow rate of 0.7 ml min}, followed by 100% (v/v) acetonitrile 
for 5 min. Before the next injection, the column was equilibrated with 30% (v/v) 
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acetonitrile for 2 min. The elution was monitored using Photo Diode Array detec- 
tor at 300 nm. 

Products of the reaction using Y224F- and Y224A-substituted FtmOx1 variants 

(compounds 2-6) were characterized using NMR and high-resolution mass spec- 
trometry (see Supplementary Information). 
Determining the a-KG dissociation constant. To maintain an anaerobic 
environment, all of the solutions were made anaerobic by several rounds of 
freeze-pump-thaw degassing. All spectroscopic studies used a 1-cm light path 
cuvette. After blanking against FtmOx1, spectra were recorded for samples to 
which anaerobic a-KG had been added. Titration plots were obtained by plotting 
the absorption at 520nm. The titration data were fitted to equation (6) 


Aobs = Anax [E-L] / n[FtmOx1y] (6) 


in which the observed absorption (Aops) was equal to the maximal 
absorption (Amax) multiplied by the concentration of enzyme-ligand complex 
({E-L]) divided by the concentration of ligand binding sites (the number (7) 
of ligands bound per subunit multiplied by the total concentration of FtmOx1 
containing Fe" ({FtmOx1,])). The concentration of enzyme-ligand complex was 


obtained using equation (7) 
2 
(7) 


where Ky is the apparent ligand affinity and [a-KGr] is the total a-KG concentra- 
tion. The Kg values were determined from equations (6) and (7) using nonlinear 
curve fitting (OriginPro 8 software). 

Self-hydroxylation in wild-type FtmOx1 protein. The FtmOx1 wild-type protein 
(0.6 mM) was mixed with (2.0 mM) a-KG in the anaerobic Coy chamber to form 
the pink species, and the UV-vis spectrum was recorded anaerobically using an 
S.L. Photonics CCD-440 spectrophotometer. All spectroscopic studies used a 1-cm 
light path cuvette. Upon exposing the above solution to O2, the solution slowly 
changed to a blue colour, and the process was monitored using a Cary Bio UV-vis 
spectrometer. 

Self-hydroxylation in FtmOx1(Y224F) variant. The Y224F-substituted FtmOx1 
(1.1 mM) was mixed with a-KG (4.0 mM) in the anaerobic Coy chamber to form 
the binary complex (a pink species), and the UV spectrum was monitored anaer- 
obically using an S.I. Photonics CCD-440 spectrophotometer. All spectroscopic 
studies used a 1-cm light path cuvette. Upon exposing the above solution to O2, 
the solution slowly changed to a blue colour, and the process was monitored using 
a Cary Bio UV-vis spectrometer. 

MS/MS analysis of FtmOx1. The following protein samples were analysed by 
tandem MS: (1) wild-type FtmOx1; (2) wild-type FtmOx] treated with a-KG and 
oxygen; (3) Y224F-substituted FtmOx1; (4) Y224F-substituted FtmOx1 treated 
with a-KG and oxygen; and (5) Y224F-substituted FtmOx1 after single-turnover 
experiments in the presence of fumitremorgin B, a-KG, and oxygen. These protein 
samples (~ 1.5 nmol) were dissolved in 50 mM ammonium bicarbonate (pH 8.0) 
buffer to make a 50 ul solution. Trypsin Gold (Promega US) was added to these 
solutions in a 1:50 (w/w) ratio, and the proteins were digested for 18 h at 37°C. A 
C18 Ziptip (Millipore) was then used to desalt each peptide sample. Each digested 
sample (500 fmol) was injected and analysed by liquid chromatography (LC)-MS/ 
MS on either an LTQ-Orbitrap XL mass spectrometer or a Q Exactive Plus Hybrid 
Quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific) coupled with 
a Triversa Nanomate system (Advion Biosystems, Inc.), and a nanoACQUITY 
UPLC (Waters) with C18 reversed phase trap (2G-V/MTrap 5 um Symmetry C18 
180 «1m x 20mm) and analytical (1.7 um BEH130 C18 150 um x 100 mm) col- 
umns. Mobile phase A consisted of 98:2 water:ACN with 0.1% formic acid (FA) 
and mobile phase B contained 98:2 ACN:water with 0.1% FA. Peptide samples 
were loaded into the trap column at 2% B with a flow rate of 4.0] min“! for 4min 
and then transferred to the analytical column at 0.5 u] min '. The gradient was 
increased to 40% B over 40 min. For tandem MS analyses, data-dependent high 
resolution higher-energy collisional dissociation mass spectra were acquired in 
the Orbitrap mass analysers and Xcalibur was used for data analysis. To identify 
target peptides, the ProteinProspector program from University of California, 
San Francisco was used to predict the potential product ions and match them to 
MS/MS product ions. The mass spectra were manually examined to verify the 
assignments. 

Pre-steady-state characterization of FtmOx1. Stopped-flow experiments were 
performed on an Applied Photophysics $X20 stopped-flow spectrometer operat- 
ing in an MBraun UNilab glove box. To maintain an anaerobic environment, all 
of the solutions were prepared in an inert atmosphere box. An oxygen-saturated 


(Kg +[a—KGy]+n[FtmOx1y]) 
[E-L]= 


tk +[a—KGy]+ n{FtmOx1;])* —4Ala— KGy]n[FtmOx1y| 


buffer solution (100 mM Tris-HCl (pH 7.5)) was mixed with an equal volume of an 
oxygen-free solution containing FtmOx1 (0.65mM), Fe" (0.58mM), a-KG 
(12 mM), substrate (0.58 mM), and 20% glycerol to initiate the reaction. 
Absorbance scans from 300 to 700 nm were collected with a diode-array detector 
at 8°C. The resulting data were processed using SigmaPlot software. 

Freeze-quench experiments were performed using a KinTek quench-flow 
instrument. Analogous to the stopped-flow experiments, an oxygen-saturated 
buffer solution (100 mM Tris-HCl (pH 7.5)) was mixed with an equal volume of 
an oxygen-free solution containing FtmOx1 (0.65 mM), Fe" (0.58mM), a-KG 
(12mM), substrate (0.58 mM), and 20% glycerol to initiate the reaction at 8 °C. 
The resulting reaction was terminated by injection of the solution into liquid 
ethane (-90°C) at various time points. The reaction time of a freeze-quenched 
sample is the sum of the ageing time and the quench time. The ageing time was 
the transit time for the reaction mixture through the ageing hose. The quench 
time corresponded to the time required after injection into the cryosolvent for 
the reaction mixture to be cooled sufficiently to prevent further reaction and was 
estimated as ~5 ms (ref. 63). 

The chemical-quench-flow experiments were performed using a KinTek 
quench-flow instrument. Analogous to the freeze-quench experiments, an 
oxygen-saturated buffer solution (100 mM Tris-HCl (pH 7.5)) was mixed with 
an equal volume of an oxygen-free solution containing FtmOx1 (0.65 mM), Fe" 
(0.58 mM), a-KG (12 mM), substrate (0.58 mM), and 20% glycerol to initiate 
the reaction at 8°C. The resulting reaction was terminated by injecting the 
solution into a microcentrifuge tube containing 4x volumes of acetone at the 
desired reaction times. Before HPLC analysis, the samples were centrifuged to 
remove protein, and the supernatant was concentrated by rotatory evaporation. 
The concentrated samples were subjected to HPLC separation on a C18 column 
(4.6 x 100mm). A linear gradient of 30-100% (v/v) acetonitrile in water was run 
for 25 min with a flow rate of 0.5 ml min“, followed by 100% (v/v) acetonitrile 
for 3 min. Before the next injection, the column was equilibrated with 30% (v/v) 
acetonitrile for 2 min. The substances were detected with a Photo Diode Array 
detector at 300 nm. 

X-band (9.64 GHz) EPR spectra were recorded on a Bruker E500A spectrom- 
eter equipped with an Oxford ESR 910 cryostat for low-temperature measure- 
ments. The microwave frequency was calibrated with a frequency counter, and 
the magnetic field was calibrated with an NMR gaussmeter. The temperature 
of the X-band cryostat was calibrated with a carbon-glass resistor temperature 
probe (CGR-1-1000 LakeShore Cryotronics). For all EPR spectra, a modulation 
frequency and amplitude of 100 kHz and 1 mT were used. The EPR spectral sim- 
ulations were performed using the simulation software Spin Count developed 
by one of the authors. 
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Extended Data Figure 1 | Characterization of FtmOx1-a-KG complex. 

a, Wild-type FtmOx1 and a-KG binding curve. The increase in absorbance at 
520nm asa function of a-KG concentration when it was added to a solution 
of wild-type FtmOx1 (0.9mM) and Fe" (0.72 mM) is plotted. On the basis of 
the equations described in the Methods (determining the a-KG dissociation 
constant), the Ky for wild-type FtmOx1 and a-KG is ~ 185+ 35 uM. 

b, Y224F-substituted FtmOx1 and a-KG binding curve. The increase of 
absorbance at 520 nm as a function of a-KG concentration when it was added 


[o-KG] (mM) 


0.00 . T T T T : 
3 4 5 0 1 2 3 4 5 


[a-KG] (mM) 


to a solution of Y224F-substituted FtmOx1 (0.9mM) and Fe" (0.7 mM) 

is plotted. Ky for Y224F-substituted FtmOx1 and a-KG is ~198 +58 uM. 

c, Y224A-substituted FtmOx1 and a-KG binding curve. The increase of 
absorbance at 520 nm as a function of a-KG concentration when it was added 
to a solution of Y224A-substituted FtmOx1 (0.7 mM) and Fe" (0.51 mM) is 
plotted. Ky for Y224A-substituted FtmOx1 and a-KG is 204 + 43 uM. In a-c, 
the Kg was calculated based on the concentration of iron-loaded FtmOx1. 
The experiments were replicated three times and error bars represent s.e.m. 
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Extended Data Figure 2 | Suppression of DOPA formation by the complex to O, when the substrate fumitremorgin B is present. Spectra were 
presence of substrate fumitremorgin B. There is no immediate evidence recorded after FtmOx1 was used as the control to blank the UV-visible 
for the formation of DOPA upon the exposure of the FtmOx1l-a-KG absorption reading. 
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Extended Data Figure 3 | HPLC chromatograms of the FtmOx1 reaction 
enzyme-concentration dependence. Chromatograms of FtmOx! reactions 
with increasing amounts of FtmOx!] relative to the amount of substrate. 
The reaction mixture contained 100 mM Tris-HCl, (pH 7.5), 180 uM 
fumitremorgin B, 2mM a-ketoglutarate, and variable amounts of FmOx1. 
Identities of the peaks were assigned based on subsequent NMR and MS 
characterizations of the isolated compounds. This experiment indicates that 
FtmOxt] is capable of catalysing endoperoxides formation in the absence of 
any other reductants. 
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Extended Data Figure 4 | Stoichiometry determination for a-KG and O, based on the fumitremorgin B (1), compound 2, and compound 3 internal 
in FtmOx1 reaction. a, b, Equivalents of endoperoxide products (2 and 3) standards. All calculations were based on the concentration of iron-loaded 
produced as a function of the ratio of a-KG to iron-loaded FtmOx!1 (a) FtmOx1. The experiments were replicated three times and error bars 
and oxygen to iron-loaded FtmOx1 (b). The quantification was conducted represent s.d. 
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Extended Data Figure 5 | Structural comparison of the active site 
topologies between FtmOx1 and TauD. a, Examination of the alternative 
configuration of a-KG in the FtmOx1l-a-KG binary complex using the 
configuration of a-KG in the TauD-a-KG binary complex. We modelled 
a-KG in this alternative binding mode and calculated the difference map. 
In the F, — F, map, strong positive density (green) and negative density (red) 
are shown even when contoured to high level (3.30), indicating that this 
configuration is not correct for the FtmOx1-a-KG complex. b, The F,— F. 
map at the active site of the FtmOx1-fumitremorgin-B complex. A model 
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of the substrate fumitremorgin B is superimposed onto the difference map, 
which is contoured at 2.80. c, Side-by-side comparison of FtmOx1 and TauD 
active-site topologies. In the left panel, the superimposition of the binary 
structures of FtmOx1-a-KG and FtmOx1-fumitremorgin-B (1) show that 
the remaining site for oxygen binding and activation is blocked from the 
substrate by Y224. d, In contrast, in the structure of the TauD-taurine-a-KG 
tertiary complex, the remaining site for O2 binding and activation directly 
faces the substrate (taurine). 
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Extended Data Figure 6 | Characterization of FumOx1 Y224F variant. 

a, Self-hydroxylation reaction in Y224F-substituted FtmOx1. Formation of 
DOPA upon exposure of the Y224F-substituted FtmOxl-a-KG complex 
to O2. b-e, MS/MS analyses of Y224F-substituted FtmOx1. b, MS/MS 
spectrum of the triply charged parent ion at m/z 768.4109 of a tryptic 
digested peptide (residue 219-237) from wild-type FtmOx1. c, MS/MS 
spectrum of the triply charged parent ion at m/z 763.0793 of a tryptic 
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digested peptide (residue 219-237) from Y224F-substituted FtmOx1. 

d, MS/MS spectrum of the triply charged parent ion at m/z 768.4109 

of a tryptic digested peptide (residue 219-237) after exposure 
Y224F(FtmOx1)-a-KG tertiary complex to O2. e, MS/MS spectrum of the 
triply charged parent ion at m/z 773.7426 of a tryptic digested peptide (residue 
219-237) for DOPA formed upon exposure of FtmOx(Y224F)-a-KG complex 
to O, in the absence substrate fumitremorgin B. 
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Extended Data Figure 7 | Mechanistic model for the production of dealkylation products in FtmOx1 Y224A or Y224F variants. 
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Extended Data Figure 8 | Pre-steady-state analyses of FtmOx1 reactions. 
a, HPLC chromatograms for FtmOx1 reactions chemically quenched at the 
indicated times. The reaction mixture in 100 mM Tris-HCl (pH 7.5) buffer 
contained FtmOx]1 (0.65 mM), Fe" (0.58mM), a-KG (12 mM), substrate 
(0.58 mM), and 20% glycerol. The mixture was mixed with O2-saturated 
buffer to initiate the reaction. There is an extra signal next to compound 3, 
which might be due to other chemicals released during the quench process. 
Results from the chemical quench experiment indicate that FtmOx1 catalysis 
is on the timescale of a few seconds per cycle. b, Time-dependent 420 nm 
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absorption change (black solid curve) determined by stopped-flow optical 
absorption spectroscopy and the concentrations of the high-spin Fe** species 
(blue squares) and the g=2 species (red dots) determined in the rapid-freeze- 
quench EPR experiments. The black solid curve is associated with the left 

y axis and is from the average of two stopped-flow trials. The blue squares 
and red dots are associated with the right y axis and are from the average of 
two rapid-freeze-quench EPR experiments. The experiments were repeated 
twice, and error bars reflect the uncertainty of the packing factor of rapid- 
freeze-quench EPR samples, which is around +10%. 
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Extended Data Figure 9 | EPR spectroscopic analyses of FtmOx1 
reactions. a, X-band EPR spectra measured at 19 K in reaction samples 
prepared at the indicated times. The black line shows the sample containing 
the FtmOx1-Fe"-a-KG complex in the absence of Oo. (There is a very small 
signal at g~ 4.3 region, only accounted for by <5 uM iron in the sample, 
which might be due to a very small amount of Fe** from inactive enzyme.) 
Bottom, the reaction sample freeze-quenched at ~0.2s after mixing the 
FtmOx1-Fe"—-a-KG complex with O). It has two signals: an Fe** (g= 4.54, 
4.26, and 3.93) and a radical signal at the g= 2 region. b, X-band EPR spectra 
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measured at 19 K for samples freeze-quenched at the indicated times showing 
the formation of high-spin ferric species on the time scale within 1s. The 
reaction was initiated by mixing the FtmOx1-Fe"-a-KG complex with Op. 
g-values are indicated in the figure. c, X-band EPR spectra measured at 19K 
for samples freeze-quenched at 0.05 s and the spectral simulation for an 
S=5/2 high-spin ferric species. The simulation parameters are: D=0.3 cm! 
E/D=0.266, o(E/D) = 0.03, and g= 4.54, 4.26, 3.93. Measurement conditions 
in a—c: microwave frequency, 9.64 GHz; microwave power, 0.2 mW; 
modulation amplitude, 1 mT; and modulation frequency, 100 kHz. 
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Extended Data Table 1 | X-ray crystallography data collection and refinement statistics 


FtmOx1 FtmOx1ea-KG FtmOx1¢fumitre 
Complex morgin B 
Leen ae 
Data collection Set-Met Native 
Space group P12,1 Pi 2y! Piz! | gal | 
Cell dimensions 
a, b, c (A) 60.6, 45.8, 105.2 60.4, 45.6, 105.4 60.6, 45.8,105.4 60.3, 45.4, 104.8 
a, B, v (°) 90.0, 100.5,90.0  90.0,99.7,90.0 90.0, 100.0,90.0 90.0, 100.3, 90.0 
Resolution (A) 48.23 - 3.49 42.91 - 1.95 36.13 - 2.54 36.05 - 2.11 
(3.63 - 3.51) * (2.02 - 1.95) (2.63 - 2.54) (2.19 - 2.11) 
-_ 0.114 (0.157) 0.101 (0.725) 0.125 (0.691) 0.077 (0.338) 
I/ol 13.19 (7.64) 18.33 (2.02) 10.53 (1.50) 17.08 (2.80) 
Completeness (%) 99.90 (100.00) 99.90 (99.22) 99.93 (99.35) 99.86 (98.63) 
Redundancy 6.1 (3.2) 6.6 (5.7) 3.7 (3.7) 3.7 (3.1) 
Refinement 
Resolution (A) 42.91 - 1.95 36.13 - 2.54 36.05 - 2.11 
(2.02 - 1.95) (2.63 - 2.54) (2.19 - 2.11) 
No. reflections 41711 18951 32435 
Resi Res 0.1643/0.2043 0.1756/0.2340 0.1670/0.2033 
No. atoms 4906 4704 4884 
Protein 4535 4535 4535 
Ligand/ion 25 23 38 
Water 346 146 311 
B-factors (A) 29.5 35.8 32.4 
Protein 29.2 35.8 31.9 
Ligand/ion 47.5 41.0 49.6 
Water 32.0 35.3 ei ee 
R.m.s deviations 
Bond lengths (A) 0.005 0.017 0.004 
Bond angles (°) 0.97 0.93 0.87 


«Highest resolution shell is shown in parenthesis. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature15533 


Corrigendum: A basal 
ichthyosauriform with a short 
snout from the Lower Triassic 
of China 


Ryosuke Motani, Da- Yong Jiang, Guan-Bao Chen, 
Andrea Tintori, Olivier Rieppel, Cheng Ji & Jian- Dong Huang 


Nature 517, 485-488 (2015); doi:10.1038/nature13866 


The data matrix in the original Supplementary Data 3 of this Letter 
reproduced the tree topology shown in Extended Data Fig. 3 but the 
accompanying character descriptions did not match the coding given 
in the data matrix. (The numbering of character states was shifted by 1 
because of a typo that occurred while editing the list in a spreadsheet, 
and the character state 3, which was erroneously numbered 4, was 
accidentally omitted from the list.) The Supplementary Information 
accompanying this Corrigendum contains the revised Supplementary 
Data 3. The revised Supplementary Data 3 also reproduces the tree 
topology shown in Extended Data Fig. 3 of the original Letter; note 
that the characters have been reordered in the revised Supplementary 
Data 3 for anatomical consistency. 

In addition, the tree statistics originally published in the Extended 
Data Fig. 3 legend were wrong because they were derived from a matrix 
where Parvinatator was removed from the original matrix. The correct 
statistics reflecting all 56 taxa in the original matrix are 243 equally 
most parsimonious trees of TL= 529, CI=0.423 and RI=0.796. 


Supplementary Information is available in the online version of this Corrigendum. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature15737 


Corrigendum: Influence 
maximization in complex networks 
through optimal percolation 


Flaviano Morone & Hernan A. Makse 


Nature 524, 65-68 (2015); doi:10.1038/nature14604 


In the Acknowledgements section of this Letter, ‘ARL should read 
‘Army Research Laboratory Cooperative Agreement Number 
W911NF-09-2-0053 (the ARL Network Science CTA)’ This has been 
corrected in the online versions of the paper. 
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CORRECTIONS & AMENDMENTS 


RETRACTION 
doi:10.1038/nature15745 


Retraction: Non-blinking 


semiconductor nanocrystals 


Xiaoyong Wang, Xiaofan Ren, Keith Kahen, Megan A. Hahn, 
Manju Rajeswaran, Sara Maccagnano- Zacher, John Silcox, 
George E. Cragg, Alexander L. Efros & Todd D. Krauss 


Nature 459, 686-689 (2009); doi:10.1038/nature08072 


In this Letter, we reported the unusual non-blinking characteristics of 
the fluorescence from individual CdZnSe/ZnSe alloyed quantum dots. 
However, it has recently come to our attention that similar fluorescence 
behaviour was seen by Celso de Mello Donega, Daniel Vanmaekelbergh 
and co-workers from a single fluorophore on bare silica glass. In par- 
ticular, individual fluorescence spots from single molecules were 
found to be non-blinking, and fluorescence spectra looked similar 
to what we reported in our Letter. We corroborated their findings 
by conducting experiments of our own on bare quartz coverslips, and 
on quartz coverslips coated with polymethyl methacrylate (PMMA). 
Although these same control experiments were performed by us before 
publication, this time we clearly observed non-blinking fluorescence 
from isolated spots on the coverslip. Furthermore, the fluorescence 
spectra from these spots were in all practical respects identical to 
what we reported in our Letter. Subsequent investigations by us have 
revealed that the surprising origins of the unusual fluorescence come 
from individual, molecular defects in silica glasses, brightened by 
the polymer coating. The details of these new findings will be the 
subject of future publications’. After examining the data of de Mello 
Donega and colleagues, and determining that we were both observ- 
ing the same phenomena, we concluded that we cannot attribute the 
fluorescence we observed to CdZnSe/ZnSe quantum dots. In view 
of these new results, we therefore wish to retract the paper and sin- 
cerely apologize for our error. All authors agree with the decision to 
retract the paper with the exception of X.R., who was unable to be 
contacted. 


1. Rabouw, F. et al. Non-blinking single-photon emitters in silica. Preprint at: 
http://arXiv.org/abs/1509.07262 (2015). 
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SCIENCE PICTURE CO./SPL 


ANTIBODY ANARCHY: 
ACALL T0 ORDER 


Antibodies used in research often give murky results. 
Broader awareness and advanced technologies promise clarity. 


bed 


Antibodies, with their distinctive Y-shape, are among the most widely used — and most vexing — reagents in biology. 


BY MONYA BAKER 


mouse first alerted Clifford Saper 
Ae the fact that antibodies were mis- 

leading the scientific community. As 
editor-in-chief of the Journal of Comparative 
Neurology between 1994 and 2011, he handled 
scores of papers in which scientists relied on 
antibodies to flag the locations of neurotrans- 
mitters and their receptors. Around the turn 
of the century, related investigations began to 


roll in from researchers using knockout mice, 
animals genetically engineered to not express 
a target gene. The results were unsettling. Anti- 
body staining in knockout animals should 
have shown radically different patterns from 
those in unmodified animals. But all too often 
the images were identical. “As we saw more and 
more retractions due to this, I began to real- 
ize that we had no systematic way to evaluate 
papers that used antibodies,” recalls Saper, now 
chair of neurology at Beth Israel Deaconess 


Medical Center in Boston, Massachusetts. 

Thus began a one-journal revolution. Saper 
and his editorial colleagues set up a policy 
of requiring extensive validation data on each 
antibody’. The policy was good for rigour, but 
not submissions, he recalls. “Many authors 
were caught in the middle, and found it easier 
to publish their papers elsewhere.” But Saper 
persisted. His efforts eventually culminated 
in the JCN Antibody Database, an inventory 
of a few thousand antibodies that can be 
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> trusted for neuroanatomy. 

Today, biomedical researchers still collect 
tales of antibody woe faster than country-music 
labels spin out sad songs. The most common 
grumble is the cheating reagent: the antibody 
purchased to detect protein X surreptitiously 
binds protein Y (and perhaps ignores X alto- 
gether). Another complaint is ‘lost treasure’: a 
run of promising experiments that stalls when a 
new batch of antibodies fails to reproduce pre- 
vious findings (see ‘A market ina bind’). 

But technological advances and shifts in 
the scientific community now promise to cut 
through this antibody quagmire. 

Antibodies are ubiquitous tools in the life 
sciences. Perhaps their most popular use is 
in western blotting to reveal the presence of a 
particular protein in cells or tissue samples, but 
they are also used to visualize proteins under 


A market ina bind 


An antibody that performs differently across 
experiments can cause calamity. But the 
performance of these reagents is linked to 
how they are manufactured. 

Polyclonal antibodies are made by 
collecting the blood of an animal immunized 
with the target antigen. Any particular lot 
will therefore only be available as long as 
the animal lives. To produce monoclonal 
antibodies, a host animal is immunized 
with the target protein or relevant portion 
of it, then the B lymphocytes that recognize 
and respond to that antigen are fused to 
a myeloma cell line that can be cultured 
indefinitely to produce the desired antibody. 

Recombinant antibodies are unlike 
traditional monoclonals because they 
can be manufactured without animals. 
Instead, these antibodies are made by 
identifying an exact gene sequence for 
an antibody — either by sequencing an 
animal’s immune cells to find those that 
produce antibodies with highest affinity 
for the target, or sequentially shuffling 
gene sequences and testing the resultant 
proteins. That gene can then be introduced 
into an appropriate cell line to produce 
antibodies. Because the identity of the 
antibody is precisely defined, the cell line 
can be regenerated if the original colony 
dies or mutates. 

The pursuit of antibody quality has 
inspired two publicly funded initiatives 
aimed at generating collections of validated 
antibodies and other protein-binding 
reagents. These produced thousands 
of new binders, but the Protein Capture 
Reagents programme, which launched 
in 2010, is already winding down, as is 
the European Union-funded Affinomics 
consortium, which launched in 2007 (ref. 8). 


the microscope by immunohistochemistry 
and immunofluorescence, as well as in many 
other applications that stem from an antibody’s 
presumed ability to bind specific biomolecules. 
A 2015 report from online purchasing portal 
Biocompare puts the market for research anti- 
bodies at US$2.5 billion a year and growing. 
The choice is dazzling: there are hundreds of 
vendors supplying products. 

It is alarming, then, to discover that anti- 
bodies can be unreliable reagents. Insufficient 
specificity, sensitivity and lot-to-lot consist- 
ency have resulted in false findings and wasted 
efforts. Antibody unreliability has taken its toll 
across studies in cancer, metabolism, ageing, 
immunology and cell signalling, and in any 
field concerned with researching complex 
biomolecules. The waste, in terms of time and 
resources, is colossal. Losses from purchasing 


Advocates say that the chosen targets, such 
as transcription factors, were particularly 
problematic and that further investments in 
such reagents would yield larger pay-offs. 

Meanwhile, polyclonals command a 
large swathe of the market. A project that 
profiled reagents used across 10,000 
biomedical papers published since 2006 
found references to 1,293 polyclonals, 

755 monoclonals and only 1 recombinant. 
Some researchers think that polyclonal 
antibodies, which can target a protein 

in multiple ways, are not only easy to 
manufacture but also particularly good at 
recognizing proteins in diverse contexts. 

Eric McIntush is chief scientific officer of 
Bethyl Laboratories in Montgomery, Texas, 
which has been selling polyclonal antibodies 
for over 40 years and plans to start selling 
recombinants in 2016. The research world 
needs both, he says. Companies simply 
cannot afford to sink funds into products 
that they may never sell. The widespread 
availability of polyclonals, which are 
currently the least expensive antibody to 
develop, may encourage experiments on 
under-investigated proteins. As targets 
become more defined and are needed for 
translational applications, he says, there will 
be a market for recombinant products. 

But researchers such as Andreas 
Plickthun, a protein engineer at the 
University of Zurich in Switzerland, think 
that polyclonals and monoclonals should 
be eliminated entirely in favour of defined 
binders. He agrees that many proteins are 
not addressed by existing reagents but 
does not see the point in making undefined 
products such as polyclonals. “Why not 
use something where the genes can be 
identified or kept?” he asks. WV/.8. 


ANTIBODIES 


poorly characterized antibodies have been esti- 
mated at $800 million per year, not counting 
the impact of false conclusions, uninterpret- 
able (or misinterpreted) experiments, wasted 
patient samples and fruitless research time’. 
Mathias Uhlén, a protein researcher at the 
Royal Institute of Technology in Stockholm, 
says that frustration with research antibodies 
has been building for years’ and that the time 
is finally ripe for improvements. “There is a 
big interest in the community to clean this up” 


SPURRED TO ACT 

Discontent has spurred action along various 
fronts. In September, Uhlén chaired the 
inaugural meeting for a working group on 
antibody validation hosted by the Human 
Proteome Organization, an international 
consortium based in Vancouver, Canada, that 
supports large-scale projects for understand- 
ing proteins. That same month, the Federation 
of American Societies for Experimental Biol- 
ogy hosted roundtables to explore problems 
with antibodies. It expects to issue recom- 
mendations early next year. The US National 
Institutes of Health (NIH) is also on the case. 
Starting in January next year, grant applica- 
tions must include a new section describing 
efforts to authenticate antibodies and other 
key resources required for experiments. Far- 
reaching solutions are likely to be hammered 
out at a meeting hosted by the Global Biologi- 
cal Standards Institute next September. The 
gathering will be held in Asilomar, California, 
where scientists gathered 40 years ago to set 
cautionary approaches for using recombinant 
genetic technology to manipulate DNA. 

“We're hoping that the community will come 
up with consensus guidelines,’ says Jon Lorsch, 
director of the US National Institute of General 
Medical Sciences in Bethesda, Maryland. That 
way, both grant applicants and reviewers will 
have resources to turn to when describing how 
they will authenticate their materials. 

Such resources could take the form ofa menu 
of broad-strokes criteria. “We are not talking 
about good and bad antibodies but antibodies 
that work in specific assays and specific con- 
text,” says Uhlén. Evaluation categories might 
include knockdown and knockout approaches 
to reveal whether an antibody still binds even 
in the absence of the target protein. Another 
approach would be to tag a target protein with 
a fluorescent marker to reveal whether the 
antibody also binds untagged proteins. A third 
category could compare a new antibody with 
a well-characterized one. Finally, researchers 
could run the antibody and whatever it binds 
through a mass spectrometer to analyse bound 
molecules for the expected protein fragments. 

Several vendors have announced their own 
characterization efforts, and new technologies 
are helping. Alan Hirzel, chief executive officer 
of Abcam, a life-sciences reagents provider in 
Cambridge, UK, says that to verify that its com- 
mercial antibodies perform as expected, the 
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ANTIBODIES 


Pairs of antibodies can be designed to signal (red) only when both detect the same target protein’. 


company is using a genome editing method 
called CRISPR-Cas9, which makes precise 
changes in DNA. The company is testing anti- 
bodies on human cell lines in which target 
genes have been disrupted by CRISPR-Cas9 
and then posting results for each reagent tested. 

“We now really have the technologies we 
need that allow us to carry out those characteri- 
zations, whereas 5 or 10 years ago, we simply 
didn't? says Klaus Lindpaintner, chief scientific 
officer at Thermo Fisher Scientific, a life-sci- 
ences tools provider in Waltham, Massachu- 
setts. Those companies with characterization 
data are starting to view this as a competitive 
advantage. In June this year, life-sciences com- 
pany Bio-Rad in Hercules, California, launched 
a line of antibodies that have been tested for 
off-target activity in western blots against 12 
different cell lines. 


Since mid-2014, Pro- “Providers 
teintech, anantibody cannot 
manufacturer inChi- guarantee 
cago, Illinois,hasbeen that agiven 
using smallinterfer- antibody will 
ing RNA to knock work for every 


down gene expres- 
sion in each new anti- 
body product — assessing whether the signal 
subsides with the expression of the target gene. 
Such efforts are nascent, however, with only a 
tiny fraction of companies’ catalogues being 
subjected to validation. 

And not all companies disclose the specific 
conditions of testing, or whether an antibody 
has performed poorly under those conditions, 
says Gordon Whiteley, lab director at the NIH’s 
Antibody Characterization Program, which 
aims to create reliable antibodies for use in 
cancer biology. The example his programme 
sets in terms of supplying testing protocols and 
resulting data could be just as important as the 
reagents themselves, he says. 

There will be no single best way to test 


tissue type.” 


antibodies, says Roberto Polakiewicz, chief sci- 
entific officer of Cell Signaling Technology, an 
antibody manufacturer in Danvers, Massachu- 
setts. “Developing an antibody is a scientific 
endeavour. You need people who know what 
experiments to do to validate an antibody.” If 
customers cannot see the data and make their 
own judgements, they need to look for a new 
antibody, he says. 

But researchers sometimes take only a 
cursory look at data, and many do not realize 
that antibodies’ performance in a given tissue 
or application, such as western blotting, says 
little about whether it will work in other sorts 
of experiments. 

And commercial providers cannot guarantee 
that a given antibody will work for every tissue 
type and experimental condition, warns Paul 
Sawchenko, a neuroscientist at the Salk Institute 
in San Diego, California. “Unless one is so for- 
tunate as to have had someone else demonstrate 
specificity in the same tissue from the same spe- 
cies under the same experimental conditions, 
you should be obliged to do this yourself” 


VITAL INFORMATION 

It would be more efficient to learn from other 
researchers’ work, but fewer than half of the 
publications that describe antibody experi- 
ments report which specific reagent was actu- 
ally used*. Even when authors do include a 
catalogue number, companies may discontinue 
products and sell off lines, making them hard 
to track, says Anita Bandrowski, an informa- 
tion scientist at the University of California, 
San Diego. Bandrowski is group leader at the 
Resource Identification Initiative, an NIH- 
backed programme involving a diverse group of 
academic collaborators. The initiative has been 
instrumental in establishing unique identifiers 
for antibodies and persuading dozens of jour- 
nals to ask authors to specifically name which 
antibodies they are using. 
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ANTIBODIES 


Three antibodies (green) against the same mitochondrial protein. The unexpected pattern on the right shows the third antibody binds an unintended protein. 


Information is beginning to accumulate. 
More than two dozen web portals have sprung 
up to help researchers select antibodies. Some 
collect user reviews on antibody performance 
and offer comparison tools. The Antibody 
Validation Channel, a project of the scientific 
publisher F1000, allows researchers to post 
their accounts and even request peer review. 
Biocompare has hired a content editor whose 
sole focus is to reach out to the research com- 
munity and get them to write reviews. 

Some antibody suppliers, such as St John’s 
Laboratory in London, offer researchers free 
products in exchange for testing and sharing 
the results. Antibodies-online, a market place 
for antibodies, arranges for an independent 
third party to perform validation. At Anti- 
bodypedia’s knockdown initiative, launched 
in September, life scientists can earn hundreds 
of dollars in free reagents if they submit data 
showing that gene-silencing reagents such as 
small interfering RNA or CRISPR-Cas9 elimi- 
nate an antibody signal for a given target. 

But many scientists are wary of information 
from anonymous reviews. Data supplied by 
both users and companies can be sparse, and 
some projects share data only if they confirm 
that an antibody works as expected. “Some- 
times it seems easier to hire a detective than to 
order a specific antibody,’ concludes an over- 
view of antibody portals’. 


FUTURE ASSESSMENTS 

Some researchers are developing mechanisms to 
compare antibodies directly. Aled Edwards at the 
University of Toronto, Canada, is director of the 
international Structural Genomics Consortium 
(SGC). He and his SGC colleagues used mass 
spectrometry to detect and compare the sets of 
proteins pulled down by immunoprecipitation 
with more than 1,000 antibodies’. The collabo- 
ration ran across 5 reference laboratories, took 
4 years and cost US$3 million, not counting 
in-kind donations. Ultimately, it established a 
procedure to score antibody quality and share 


quantitative information about its performance, 
specifically for ‘pull-down experiments, in 
which proteins are pulled out of solution using 
antibodies. 

Fridtjof Lund-Johansen, a proteomics 
researcher at Oslo University Hospital in Nor- 
way, is developing an ambitious bead assay 
that tests thousands of antibodies at once’. The 
plan is to separate cellular proteins into many 
different fractions, then profile the proteins in 
each fraction using two different methods. One 
is mass spectrometry and the other is a bead- 
based array with thousands of antibodies. The 
mass spectrometry data serve as a reference for 
the results obtained with antibodies. Turning the 
idea into a refined assay will take considerable 
work, Lund-Johansen admits. “It is extremely 
ambitious. It is totally crazy, but it is the only 
way to go.’ Other scientists are intrigued at the 
approach but wonder if it will predict antibody 
performance in common techniques. 

Blanket assessments of antibodies can be 
overinterpreted, says Ulf Landegren, a proteom- 
ics technology developer at Uppsala University 
in Sweden. “It is far more meaningful to discuss 
the ability of assays to detect the correct protein, 
rather than whether antibodies or other binders 
bind the right protein.’ A case in point is cross- 
reactivity, when an antibody binds proteins 
other than its specified target. Cross-reactivity 
depends not just on a particular antibody, but 
also on the complexity of a sample, the con- 
centration of the antibody and the rarity of the 
target protein. He recommends that rather than 
relying ona single antibody, researchers should 
instead test antibodies in pairs that are designed 
to bind to different parts of a target protein. 
Parts ofa sample labelled with both reagents are 
less likely to represent off-target binding. 

One problem with this approach is that it is 
hard for scientists to know if they are purchas- 
ing different antibodies. Vendors often obtain 
products from different sources and are not 
required to disclose the original manufacturer. 
As a result, researchers who want to compare 


several antibodies may end up comparing 
identical products sold by several vendors. A 
handful of companies, including Genlogica 
and One World Laboratories, both in San 
Diego, California, only sell products labelled 
by the original manufacturer and offer ‘trial 
size’ antibody batches so that researchers can 
test products side by side in their labs. 

The toughest challenge is not so much in 
antibody characterization but in persuading cell 
biologists to hold back on using antibodies until 
these are thoroughly evaluated, says Edwards, 
although he doubts that scientists will become 
savvier unless funders and publishers force the 
issue. “Right now we have an unregulated mar- 
ket, where you don't have to have any quality 
to sell your product.” In other words, he says, 
guidelines, characterization data and conscien- 
tious vendors only matter if researchers invest 
effort into selecting reagents. m 


Monya Baker writes and edits for Nature in 
San Francisco, California. 
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a 


The Technology Feature ‘Connectomes 
make the map’ (Nature 526, 147-149; 
2015) misnamed the MultiSEM model 

and gave the wrong citation in reference 

3. MultiSEM 505 should have been Zeiss 
MultiSEM, and ref. 3 should have referred to 
Zingg, B. et al. Cell 156, 1096-1111 (2014). 
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Ua SCIENCE FICTION 


BY S. J. ROSENSTEIN 


ey, there. How you doi? Yeah, 
H: too. My wife likes the 

heat, but it’s too much for 
me. It's why I come here, the best 
thing about this place is the air 
con. No one would say it’s 
the coffee. No offence. It’s 
good enough for me, ’m 
happy as long as I don't 
have to drink any of that 
fancy frou-frou stuff. 
I'll have mine black. 
My wife made me give 
up cream, she won't 
stop going on about 
my cholesterol. 

Yeah, I seen the news. 
Come on now, you don’t 
believe that garbage, do 
you? Them Claimers is as 
crazy as my mom, and she 
thinks the CIA is spying on 
her through her sprinkler. You 
don't think someone would have 
noticed if the president was being 
controlled by aliens? They’ve got to 
have about a dozen doctors and secret- 
service agents watching him the whole 
time. Don't tell me you believe they can see 
the future, too. That’s what they say. Kooks 
and slackers with nothing better to do with 
their lives, sitting in their basements work- 
ing themselves into a frenzy over nothing. 
Look here now, if I was an alien and I could 
see the future, I'd just buy me a lottery ticket 
and retire to Hawaii. 

No, you're just plain wrong. I got more 
right to an opinion than you, cos I know 
what I’m talking about. You think just ‘cos I 
don’t wear a thousand-dollar suit I’ve never 
met the great and the good? I met the presi- 
dent only two weeks ago. Yeah. Yeah, I guess 
it was exciting. Well, the company I work for, 
they make the toilets for the shuttle. The one 
the president used to go up to the alien ship, 
yeah. Well, just on the way back, after hed 
met the aliens and taken all those photos 
you seen in the papers, the toilet got blocked 
up. They didn’t know how soon they need 
to go back, so I drove up right away. Blew 
through a few stop signs too, I kind of hoped 
a cop would stop me so I could tell them I 
was on the way to fix the president's toilet. 
Didn't happen, though. 

So I got to the spaceport just as the shuttle 
was getting in, and the president got off. Id 
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sort of wormed my way forward so I could 
get on as fast as possible, ‘cos they'd been 
pretty mad about the toilet and I wanted to 
show that even though we're just a tiny com- 
pany, we still give a great service. The presi- 
dent looked real tired, and as I was trying 
to get to the door he looked up and saw me, 
and took a few steps towards me, and shook 
my hand. This hand right here touched the 
president. Huh. No, I dunno what that is. 
Little scratch or something. Well, it looks a 
bit weird, but it doesn't feel infected. It’s been 
there a couple of weeks. Nothing to worry 
about. 
Anyway, I think the president thought 
I was a foreign dignitary or something, he 
started talking about the future of our world 
and strategic realignments and trade agree- 
ments, and what a great deal the aliens were 
offering us and how wonderful the future 
was gonna be. I just sort of stood there, 
didn’t really know 


> NATURE.COM what to say. One of 
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to him, and he snapped out of it OK. He 
thought it was pretty funny when he real- 
ized I was there to fix the toilet. We had 
alaugh. Well, no, I didn't. But I didn't 
want to tell him. Would be kind 
of awkward, wouldn't it, meet- 
ing the president and the first 
thing you say is you didn’t 
vote for him? 
So I went in and fixed 
the toilet. Wouldn't you 
know it, one of the 
spooks had managed 
to drop one so big and 
dense it blocked the 
system. No, I don’t 
think it was the presi- 
dent. He just doesn’t 
look like that sort of 
guy. Anyways, there 
you have it. I’ve met 
him, and so I think I have 
more right to an opinion 
than some woman on the 
TV with more hair than brain 
cells. Switch it over, won't you? 
Let’s see the game. 
You a Yankees fan? Nah, me nei- 
ther. Didn't think so, not down here, but 
it doesn’t hurt to ask. Not like I'd cheer the 
Red Sox either, but you gotta feel sorry for 
their fans. Mind you, they'll be happy this 
time. Well, I know it looks like that, but they 
ain't gonna lose. That guy coming up to the 
plate, he'll hita home run. Don’t ask me how 
I know, I just know. Same way you wake up 
in the morning sometimes and you know it’s 
gonna be the sort of day when a bird craps 
on you. Only, like, stronger. There he goes. 
See, didn’t I tell you? You should have put 
money on it. 

Well, I got to be going. Look here, no hard 
feelings, eh? Sorry I was a bit touchy, I just 
get so mad when I hear the nonsense peo- 
ple are spouting. Shake on it? Oh, I’m sorry. 
Must have scratched you with my ring or 
something. Don't worry, it'll heal right up. 
Well thanks, I’m glad to hear you say that. It 
sure makes me feel good, knowing someone 
like me can change the way someone else 
thinks. You have a good day too. m 


S. J. Rosenstein is a research scientist with 
a secret identity as a writer, although both 
incarnations wear glasses and neither are 
particularly mild mannered. She complains 
about life at alackoftheologyandgeometry. 
wordpress.com. 
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OVARIAN CANCER: BEYOND RESISTANCE 


Successfully treating the cancer requires 
overcoming the almost inevitable 
development of resistance to standard 
platinum-based therapy. 


BY DAVID HOLMES 


varian cancer is the most common cause of gynaeco- 

logical-cancer-associated death. Although the past 

40 years have seen our knowledge of the disease advance, 
translating that improved understanding into a tangible clini- 
cal benefit has been a tortuous process. The last major break- 
through in treatment came 20 years ago, with the addition of 
a taxane (paclitaxel or docetaxel are commonly used to treat 
ovarian cancer) to one of the several variants of platinum-based 
chemotherapy that remain the mainstay of treatment. Since 
then, refinements to surgery and to the timing and delivery 
of chemotherapy have produced only slight improvements in 
outcomes. In the United States, for example, 5-year survival has 
inched up from about 40% in 1985 to a still parlous 45% today. 
By comparison, 5-year survival for breast cancer stands at 90%. 

Two factors account for much of the stubbornly high mor- 
tality and morbidity associated with ovarian cancer — late 
diagnosis and treatment resistance. There are currently no 
approved methods to screen for ovarian cancer, although 
promising preliminary results released earlier this year from 
the UK Collaborative Trial of Ovarian Cancer Screening may 
begin to change that. Around 60% of women are diagnosed 
with late-stage disease that has already spread within the 
abdomen. As many as 80% of these women will respond well 
to initial treatment with platinum-based therapy, but almost 
all will experience multiple recurrences of disease, with ever 
shorter disease-free intervals. Ultimately, almost all of these 
women will die from the disease — and most will die froma 
disease that is resistant to platinum chemotherapy. 

As with most solid malignancies, resistance to platinum- 
based treatment can be intrinsic or acquired, and is brought 
about through a bewildering array of mechanisms. From 
pumps that eject the drug from the cell to promoting the 
expression of genes that enable alternative growth pathways, 
cancer cells leave no stone unturned in their bid to survive and 
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proliferate. Further complicating matters is the difficulty of 
knowing which mechanism or mechanisms are active in any 
particular person. 

The good news is that there are a huge number of experi- 
mental therapies in development, and it is hoped that these can 
be added to platinum-based chemotherapy to help deliver a 
knockout blow, or at least prolong the intervals between treat- 
ment and improve patients’ quality of life. Vaccines to activate 
the immune system against tumours, agents to interfere in 
DNA- repair pathways, and therapies that choke off the supply 
of blood to the tumour are all now in clinical trials for ovar- 
ian cancer. Because of the complexity of the disease and the 
mechanisms that underlie treatment resistance, it is unlikely 
that any one therapy will be a silver bullet. Nevertheless, there 
is a growing sense of optimism that researchers will be able to 
translate hard-won knowledge into improved outcomes for 
patients. 

Nature is pleased to acknowledge the financial support of 
Pharma Mar, S.A. in producing this Outline. As always, Nature 
retains sole responsibility for all editorial content. 
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O) UNE) i) OVARIAN CANCER 
>) For an animated version of this graphic visit: 
go.nature.com/ghn2pe 


Ovarian cancer is difficult to treat, largely because tumours are often found late and develop 
resistance to initial treatment: platinum -based therapy. New approaches promise to break 
through the platinum barrier. By David Holmes; illustration by Lucy Reading-Ikkanda. 


BIG PROBLEM, LITTLE PROGRESS 
BAD TIMING THETOLL OF RESISTANCE SLOW PROGRESS 


The earlier that ovarian cancer is identified, the better Of US women diagnosed with ovarian cancer, 60% The late stage at which most ovarian cancers are 

the odds are that treatment will be successful. Women have late-stage disease. Most of these initially respond diagnosed, the fact that such a high proportion become 
are not screened because current methods are not well to treatment with a combination of paclitaxel (a resistant to platinum-based chemotherapy, and the 
reliable enough to predict whether or not women have drug that interferes with cell division) and carboplatin small number of approved alternatives to platinum 

the disease. Early symptoms of ovarian cancer are often (a platinum-based drug that damages cancer-cell therapy, mean that ovarian cancer has a relatively low 
confused with irritable bowel or premenstrual syndrome, | DNA). However, more than half will relapse within five-year survival rate. In the United States, for example, 
so most people are diagnosed with late-stage disease. 18 months of diagnosis. it is just 45.6%. 


1 
60 0 of cases are diagnosed as late-stage disease ... 


The proportion of women with 
ovarian cancer who survive five years 
or more after diagnosis has changed 
little in more than a decade. The 
outlook is bleaker than for women 
with breast cancer. 
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...and 25% of these women will have recurrence with a (Ovarian) (Ovarian) (Breast) 
platinum-resistant tumour within the first 12 months. 


SOURCES: GLOBOCAN.IARC.FR; SEER.CANCER.GOV 


*Age standardized estimate 


THE ROOTS OF RESISTANCE 


Most researchers agree that, in common with many cancers, a small population of platinum-resistant cancer cells exists 
in ovarian tumours before treatment and flourishes once treatment has killed their platinum-sensitive counterparts. This 
results in regrowth of the tumour, and a low probability that it will respond to further treatment with platinum-based drugs. 


The tumour is made up of platinum-sensitive Death of platinum- 
cancer cells (purple), and a small population sensitive cells 
of platinum-resistant cells (green). 
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cells 


Platinum-resistant 
cells multiply 
uncontrollably, 
forming a resistant 


Platinum enters cancer cells where tumour. 


it binds to and damages DNA. In 
platinum-sensitive cells, this results 
in programmed cell death. 
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The platinum-resistent cells 

use a complex repertoire of 
mechanisms to mitigate the 
effects of platinum therapy, 
including DNA damage repair, 
decreased drug uptake, 
increased platinum removal 
and sequestration of the metal 
into lysosomes. 
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THE NEW WAVE OF THERAPY 


From priming the immune system to fight ovarian tumours to cutting off the cancer’s blood supply, researchers 


are testing a variety of ways to overcome resistance to platinum-based chemotherapy. 


SCRAMBLING THE CODE 


Ramping up DNA-repair pathways 
is one of the ways that cancer cells 
resist the DNA-damaging effects 

of platinum. If those DNA-repair 
pathways could be dampened down 
it might be possible to resensitize 
cancer cells to platinum. There are 
several drugs in development that 
aim to do just that. PARP inhibitors 
disrupt the mechanism by which 
damaged parts of DNA are removed, 
and the drug trabectedin binds 
directly to and damages the DNA. 
Both have shown early promise. 

The drug topotecan blocks the 
action of the enzyme TOP1, which 
helps to repair DNA damage, and is 
already licensed for the treatment of 
recurrent ovarian cancer. However, its 
effect on overall survival is limited. 


DNA damage and disruption of 
DNA-repair mechanisms leads to 
cell death. 
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IMMUNE BOOSTERS 


Priming the immune system to 
recognize and attack cancer cells 
might be an effective way of stunting 
the growth of tumours in people with 
recurrent ovarian cancer. A UK trial 
called TRIOC is testing whether the 
TroVax vaccine, which has this priming 
effect, can boost an individual’s 
anticancer immune response enough 
to slow the growth of recurrent ovarian 
tumours and delay the need for a 
second line of chemotherapy. In the 
trial, the vaccine is given to people who 
have high levels of a marker called 
CA125 in their blood, which indicates 
that a cancer may have returned. 


The vaccine-primed immune system 
releases antibodies and T cells to 
bind to antigens on the cell surface. 
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HORMONE THERAPY 


Similar to many breast cancers, some 
ovarian cancer cells have oestrogen 
receptors on their surface and may 
require the hormone to grow and 
spread. This has led researchers 

to test the hormone treatment 
tamoxifen, which is often used to 
treat oestrogen-receptor-positive 
breast cancers, in women with 
advanced ovarian cancer. Tamoxifen 
blocks oestrogen from reaching the 
cells and has been shown to work 
for a small proportion of women 
with recurrent cancer that does not 
respond to chemotherapy. Several 
other hormone treatments, such as 
letrozole and anastrozole, are also in 
clinical trials. 


Tamoxifen competes with oestrogen 
to bind to oestrogen receptors, 
preventing oestrogen-induced cell 
division and tumour growth. 
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STARVING THE TUMOUR 


Several treatments are in clinical 
trials to assess whether blocking the 
blood supply to tumours can slow 
down their recurrence. The antibody 
bevacizumab prevents the formation 
of new blood cells by inhibiting 
activity of the signalling protein VEGF, 
which is involved in the growth of 
blood vessels. The drug has already 
been approved by the US Food 

and Drug Administration and the 
European Medicines Agency for use 
in combination with chemotherapy 
for platinum-resistant relapsed 
ovarian cancer. Another drug — 
cediranib — disrupts the formation 
of blood vessels around the tumour 
by inhibiting a type of signalling 
protein called tyrosine kinase. In a 
trial called ICON6, the drug increased 
survival by three months compared 
with standard treatment for recurrent 
ovarian cancer. Several other drugs 
that block blood-vessel growth, such 
as combretastatin, pazopanib and 
trebananib, are also in clinical trials. 


Cancer cells release VEGF to 
promote the growth of blood 
vessels around the tumour. Drugs 
that disrupt the VEGF-signalling 
pathway prevent the formation of 
vessels and limit tumour growth. 
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Blood vessels 
degenerate, cutting 
off the tumour’s 
nutrient supply and 
causing it to stop 
growing or regress. 
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Despite the enormous efforts that 

go into cancer research, the current 
approach to drug discovery is largely 
constrained by what we already know 
about cancer and to some extent the 
imagination of the human mind. 


Unlocking the Potential of 
Marine-inspired Oncology 


PharmaMar is pursuing a different 
approach that leverages nature's 
evolution in the enormous biodiversity 
found in the world’s oceans, discovering 
organisms with unique biophysiology 
and powerful anticancer properties. 


PharmaMar’s rigorous approach to 
research is unlocking the remarkable 
therapeutic potential of marine 
ecosystems to bring new hope for 
better anticancer therapies. 


Pharma Mar, S.A. www.pharmamar.com 
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